Press "Enter" to skip to content

Category: ETL / ELT

AWS DMS and a LOB Bug

Richard O’Riordan fixes an issue:

The table over in our Postgres cluster is similar except for the data type “text” being used instead of “varchar”. All kind of boring so far, but what we noticed that on some very rare occasions the “largevalue” column was empty over in the PostgreSQL database even though for that row it was populated in SQL Server.

This seemed odd to me, like you would expect if there was some error inserting the row on the PostgreSQL side then since it is all done within a transaction that it would either all succeed or all fail, how would the row be partially inserted, i.e. missing this text value.

Read on for the story and several tests of how things work.

Comments closed

Setting a Default Destination for Fabric Dataflows Gen2

Jon Voge wants to spend less time copying and pasting:

Ever had a Dataflow Gen2 in which you needed to map the output of several queries to the same Warehouse or Lakehouse? Takes a while to setup, right?

If you wish to add a Default Destination to your Dataflow, all you need to do is to create the Dataflow from inside your desired destination. This works for both Warehouses, Lakehouses and KQL Databases:

Click through for an example of how it works.

Comments closed

Execute a Collection of Child Pipelines from Metadata in Data Factory

Andy Leonard continues a series on design patterns:

In this post, I clone and modify the dynamic parent pipeline from the previous post to retrieve metadata from an Azure SQL database table for several child pipelines, and then call each child pipeline from a parent pipeline.

When we’re done, this pipeline will:

  1. Read pipeline metadata from a table in an Azure SQL database
  2. Store some of the metadata (a collection of pipelineID values) in the (existing) pipelineIdArray variable
  3. Iterate the pipelineIdArray variable’s collection of pipelineID values
  4. Execute each child pipeline represented by each pipelineID value stored in the pipelineIdArray variable

Read on to learn how.

Comments closed

Move Data between Lakehouses and Workspaces in Microsoft Fabric

Gilbert Quevauvilliers performs an exfiltration:

With the new Schema’s in a Lakehouse, it now is possible to read from Lakehouse A (In Workspace A) and write to Lakehouse B (In Workspace B).

Here are more details about the Schema preview: Lakehouse schemas (Preview) – Microsoft Fabric | Microsoft Learn

This opens a whole new world of possibilities.

I also really like the fact that I can simply use the Names, and I do not need to get the actual GUIDS!

For example, I can use the following as shown below which is WorkspaceName.LakehouseName,SchemaName.TableName

Click through to see it in action.

Comments closed

Dynamically Start a Collection of Child Pipelines in Fabric Data Factory

Andy Leonard continues a series on Microsoft Fabric Data Factory:

In this post, I modify the dynamic parent pipeline from the previous post to explore calling several child pipelines that may be called by a parent pipeline. In this post, we will:

  • Clone the child pipeline (twice)
  • Copy the cloned child pipeline id values
  • Clone the dynamic parent pipeline from the previous post
  • Add and configure a pipeline variable for an array of child pipeline ids
  • Add and configure a ForEach
    • Move the “Invoke Pipeline (Preview)” activity
    • Configure the “ForEach”
    • Configure the “Invoke Pipeline (Preview)” Activity to Use “ForEach” Items
  • Test the execution of a dynamic collection of child pipelines

Andy’s got quite a bit in this post, so check it out.

Comments closed

Dynamically Start a Child Pipeline in Fabric Data Factory

Andy Leonard continues a series on Fabric Data Factory design patterns:

In an earlier post titled Fabric Data Factory Design Pattern – Basic Parent-Child, I demonstrated one way to build a basic parent-child design pattern in Fabric Data Factory by calling one pipeline (child) from another pipeline (parent). In a later earlier post titled Fabric Data Factory Design Pattern – Parent-Child with Parameters, I modified the parent and child pipelines to demonstrate passing a parameter value from a parent pipeline when calling a child pipeline that contains a parameter.

In this post, I modify a parent pipeline to explore parameterizing which child pipeline will be called by the parent pipeline. In this post, we will:

  • Copy the child pipeline id
  • Clone a parent pipeline
  • Add and configure a pipeline variable for the child pipeline id
  • Test the dynamic pipeline id

Read on to see how.

Comments closed

Restarting Failed Control Flows in Azure Data Factory

Meagan Longoria doesn’t want to repeat good work:

I presented at SQL Saturday Pittshburgh this past weekend about populating your data warehouse with a metadata-driven, pattern-based approach. One of the benefits I mentioned is that it’s easy to employ this pattern for restartability.

For instance, let’s say I am loading data from 30 tables and 5 files into the staging area of my data mart or data warehouse, and one of table loads fails. I don’t want to reload the other tables I just loaded. I want to load the ones that have not been recently loaded. Or let’s say I have 5 dimensions and 4 facts, and I had a failure loading a fact table. I don’t want to reload my dimensions, and I only want to reload the failed facts. How do we accomplish this?

Read on to learn how.

Comments closed

Invoking a Fabric Data Factory Pipeline from a Parent Pipeline

Andy Leonard takes us through a design pattern:

In an earlier post, I demonstrated one way to build a basic parent-child design pattern in Fabric Data Factory by calling one pipeline (child) from another (parent). In this post, I modify the parent and child pipelines to demonstrate calling a child pipeline that contains a parameter. In this post, we will:

  • Clone and edit the child pipeline
  • Clone and edit the parent pipeline
  • Test

Read on to see how it works.

Comments closed

Cloud Connections in Microsoft Fabric

Dennes Torres makes a connection:

wrote about cloud connections when they were in a very early stage.

Cloud connections evolved and are now sharable. We call the “regular” connection as “personal connection”.

The problem with the “personal connections” is the difficult to make teamwork. The personal connections belong to you and different developers can’t use them. When a different developer needs to work with the same objects, they are required to create their own connection.

Using cloud connections, we can create a single, reusable connection to the data source and share it with all the developers in the team.

Read on to learn more about how they work now that the feature is a bit more mature.

Comments closed