Press "Enter" to skip to content

Month: July 2020

ADF.Procfwk Version 1.8

Paul Andrew has been busy:

Following more great feedback from the Data Platform community the primary goal of this release was to further improve the resilience of the framework processing. These improvements included its restart clean up capabilities and introducing better dependency chain handling between Worker pipelines when failures occur. The latter builds on the existing restart functionality first introduced in release v1.2 and supplements the logic using a new set of pipeline dependency metadata. I’ve created the below visual to conceptually show the new dependency chain behaviour, should you wish to populate and make use of the new metadata handling.

Read on for the full changelog.

Comments closed

A Note on Distributed Network Names

Allan Hirt provides an explanation around Distributed Network Names when building Windows Server Failover Clusters on Windows Server 2019:

The new Windows Server 2019 DNN functionality does have a side effect that does affect Azure-based configurations. When creating a WSFC, Windows Server 2019 detects that the VM is running in Azure and will use a DNN for the WSFC name. This is the default behavior.

I clipped this paragraph specifically because Allan uses both “affect” and “effect” correctly, and I wanted to call that out. Do read the rest of it as well.

Comments closed

Understanding the Identity Cache

Jignesh Raiyani explains the purpose of the identity cache and shows how you might disable it if you want to:

In the release of SQL Server 2012, Microsoft has introduced a feature of Identity Cache. The identity jump doesn’t cause any issue neither with the database nor the running tasks; however, this identity value gap is not acceptable in some of the business-oriented scenarios. This Identity counter values of the columns are stored in the system table separately, which is known as sys.identity_columns DMV. This reference object sys.identity_columns does not rely on the transaction status, and it doesn’t matter that the transaction on the user table is committed or rolled back. Now, in case this user transaction is being rolled back, we will find that the identity value has been skipped in the sys.identity_columns table. The reason being is that the value of the Identity column value does not get rolled back. Apart from the above scenario, there could be various reasons or situations where these issues of the skip of the Identity value will be seen, which will be small in number and random. However, when there is an issue of Identity value jump, the amount of the skip value will be prominent and precise in number, depending upon the column data type, which is easy to identify.

My advice is, don’t disable this unless you have a really good reason to. “I don’t want to see numbers skipped” is a bad reason because identity columns never guarantee sequential insertion. You can still have skips due to transaction rollbacks or other errors on insertion, and the identity cache helps with performance problems in cases with lots of writers inserting rows into the table.

Comments closed

Data Lakes for Smaller Projects

Thomas Spicer explains that your data lake doesn’t need to be enormous to be useful:

We recently wrote an article debunking common myths about data lake architectures, data lake definitions, and data lake analytics. It is called What is a Data Lake? Get A Leg Up Avoiding The Biggest Myths.” In that article, we framed the current conversation about data lakes and how they fit within enterprise data strategies. This topic has historically been confusing and opaque for those wanting to get value from a data lake due to conflicting advice from consultants and vendors.  

One area that can be particularly confusing is the perception that lakes are only for “big data.” If you spend any time reading materials on lakes, you would think there is only one type and it would look like the Capsian Sea (it’s a lake despite “sea” in the name). People describe data lakes as massive, all-encompassing entities, designed to hold all knowledge. The good news is that lakes are not just for “big data” and you have more opportunities than ever to have them be part of your data stack.

Click through for Thomas’s argument.

Comments closed

The Basics of Autoregressive Models

Holger von Jouanne-Diedrich explains some of the principels of autoregressive models through a demonstration:

Well, this seems to be good news for the sales team: rising sales! Yet, how does this model arrive at those numbers? To understand what is going on we will now rebuild the model. Basically, everything is in the name already: auto-regressive, i.e. a (linear) regression on (a delayed copy of) itself (auto from Ancient Greek self)!

So, what we are going to do is create a delayed copy of the time series and run a linear regression on it. We will use the lm() function from base R for that (see also Learning Data Science: Modelling Basics).

Read on for some additional understanding.

Comments closed

Tips for Debugging in Visual Studio

Patrick Smacchia has 12 tips for debugging in Visual Studio:

4) Data breakpoint: Break when value changes

If you set a breakpoint to a non-static property setter it will be hit when changing the property value for all objects. The same behavior can be obtained for a single object thanks to the Locals (or Watch) window right click : Break When Value Changes menu.

This facility is illustrated with the animation above. The hit occurs only when obj2.Prop is changed, not when obj1.Prop is changed.

These go a step beyond the basics, so check them out.

Comments closed

Configuring Power BI Incremental Refresh

Gilbert Quevauvilliers has a follow-up from a post:

Following on from my successful blog post How you can incrementally refresh any Power BI data source (This example is a CSV File), I found a way where I can just use dates created in Power Query to get data refreshing incrementally.

Full credit goes to Rafael Mendonç who actually figured this out. All that I have done is to translate what Rafael Mendonça did in his PBIX and put it into steps that you can follow along with.

https://www.rafaelmendonca.com/2020/06/incremental-powerbi-csv-api-excel-odbc.html

In this blog post I am going to demonstrate how to get this working with what I hope is very easy to follow.

Read on for the process.

Comments closed

Truncating All Tables in a Database with Powershell

Jess Pomfret nukes the database from orbit, as it’s the only way we can be sure:

The most popular post on my blog so far was called ‘Disable all Triggers on a Database’ and this one is a good follow up from that post.

The scenario here is you need to remove all the data from the tables in your database. This could be as part of a refresh process, or perhaps to clear out test data that has been entered through an application.  Either way, you want to truncate all the tables in your database.

Click through for the code.

Comments closed