December 2020 – Page 13

Firstly, understanding how these limits apply to your Data Factory pipelines takes a little bit of thinking about considering you need to understand the difference between an internal and external activity. Then you need to think about this with the caveats of being per subscription and importantly per Azure Integration Runtime region.
Assuming you know that, and you’ve hit these limits!

Click through to see what happens. It’s not pretty.

Comments closed

Logging Schema Changes with DDL Triggers

Published 2020-12-03 by Kevin Feasel

Andreas Wolter shows how we can implement a log of object changes with DDL triggers:

Over the years working on customer systems, I personally found it to be invaluable and as best practice equipped any database that I designed with such a small trigger and DDL-log-table, just in case. It has helped many times to quickly solve issues with deployments scripts, non-scripted changes to the systems, problems with Source Control and simply getting answers quickly.
The concept is almost trivial and because DDL changes are usually not in performance-critical code-paths, the theoretical overhead on the DDL statement-runtimes is not relevant. (Unless frequent schema-changes are part of a performance-sensitive workload – in which case I would then question if using DDL is a good idea at all in such a place. Note that temporary tables are not caught by DDL Triggers.)

Click through for more information, as well as a sample script.

Comments closed

Compressing and Storing LOB Data

Published 2020-12-03 by Kevin Feasel

Randolph West hits the archives:

When we look at the several data types for storing LOB data in SQL Server, the ones we might think of first are specifically designed for storing clear text. These are VARCHAR(MAX) and NVARCHAR(MAX), which you can read up about on Microsoft Docs. They’re variable-length data types that are used to store clear text between 8,000 bytes and 2 GB. The key distinction is that NVARCHAR is for Unicode strings, which use at least two bytes per character.
Then we get VARBINARY(MAX), also for storing between 8,000 bytes and 2 GB. This data type does not make any assumptions about the stored blob’s format. You can store text, zip files, images, public key security certificates, you name it. If it can be persisted to storage, it can be stored in VARBINARY(MAX).

Read on for an example of Randolph using Gzip compression and storing XML data as binary for auditing purposes.

Comments closed

Working with Self-Hosted Integration Runtimes

Published 2020-12-03 by Kevin Feasel

Craig Porteous walks us through some of the planning necessary for self-hosted integration runtimes:

If your Data Factory contains a self-hosted Integration runtime, you will need to do some planning work before everything will work nicely with CI/CD pipelines. Unlike all other resources in your Data Factory, runtimes won’t deploy cleanly between environments, primarily as you connect the installed runtime directly to a single Data Factory. (We can add more runtime nodes to a single Data Factory but we cannot share a single node between many data factories*). An excerpt from Microsoft’s docs on Continuous integration and delivery in Azure Data Factory mentions this caveat.

Read on for the consequences and two options available to you.

Comments closed

Temp Tables as a Release Valve for Design Issues

Published 2020-12-03 by Kevin Feasel

Erik Darling shows a benefit of temp tables and cold, bitter experience:

A lot of the time when I see queries that are written with all sorts of gymnastics in the join or where clause and I ask some questions about it, people usually start complaining about the design of the table.
That’s fine, but when I ask about changing the design, everyone gets quiet. Normalizing tables, especially for Applications Of A Certain Age™ can be a tremendously painful project. This is why it’s worth it to get things right the first time. Simple!
Rather than make someone re-design their schema in front of me, often times a temp table is a good workaround.

Click through for an example.

Comments closed

Stochastic Processes in R

Published 2020-12-02 by Kevin Feasel

David Robinson takes us through simulation of a random walk in R:

What’s fun about this problem is that it’s an example of a random walk: a stochastic process made up of a sequence of random steps (in this case, left or right). What makes this a fun variation is that it’s a random walk in a circle- passing 5 to the left is the same as passing 15 to the right. I wasn’t previously familiar with a random walk in a circle, so I approached it through simulation to learn about its properties.

Click through for a simulation. Or 50,000 of them.

Comments closed

Logistic Regression as a Neural Network

Published 2020-12-02 by Kevin Feasel

Holger von Jouanne-Diedrich takes us through ways in which we can understand certain types of neural networks from the lens of logistic regression:

We already covered Neural Networks and Logistic Regression in this blog.
If you want to gain an even deeper understanding of the fascinating connection between those two popular machine learning techniques read on!

Click through for the explanation.

Comments closed

Durable Keys in Type 2 Dimensions

Published 2020-12-02 by Kevin Feasel

Martin Schoombee takes us through the idea of durable keys:

Also called an immutable or persisted key (I like durable better), a durable key is nothing more than a surrogate key (i.e. integer value or nonsensical number) used to identify a dimension member (company, employee, etc.) uniquely in a type-2 dimension. Confusing enough? It’s easier to explain with an example…

When I read Martin’s post, I kind of got it but said to myself, “How would I run this type of query more efficiently?” The thing that wasn’t clicking came from another article on the topic: you add the durable key to the fact as well as the current key. That way, you can join back to the Company dimension on CompanyKey if you want to get the company data as of the fact date, or you can join on DurableCompanyKey (and CurrentRecord = 1) to get the latest company data regardless of the fact date. Now that this is clear, I like the strategy a lot.

1 Comment

dbachecks Against Azure SQL Databases

Published 2020-12-02 by Kevin Feasel

Jess Pomfret takes us through running dbachecks on an Azure SQL Database:

Last week I gave a presentation at Data South West on dbachecks and dbatools. One of the questions I got was whether you could run dbachecks against Azure SQL Databases, to which I had no idea. I always try to be prepared for potential questions that might come up, but I had only been thinking about on-premises environments and hadn’t even considered the cloud. The benefit is this gives me a great topic for a blog post.

Click through for the answer.

Comments closed

Parallel Inserts into Temp Tables

Published 2020-12-02 by Kevin Feasel

Erik Darling explains the pre-conditions for parallel insertion into temporary tables:

If you have a workload that uses #temp tables to stage intermediate results, and you probably do because you’re smart, it might be worth taking advantage of being able to insert into the #temp table in parallel.
Remember that you can’t insert into @table variables in parallel, unless you’re extra sneaky. Don’t start.
If your code is already using the SELECT ... INTO #some_table pattern, you’re probably already getting parallel inserts. But if you’re following the INSERT ... SELECT ... pattern, you’re probably not, and, well, that could be holding you back.

There are enough pre-conditions that this becomes a decision rather than an automatic. Especially if you’re dealing with temp tables with indexes and want to take advantage of temp table reuse, which I believe precludes changing the structure of the table (including adding indexes) after creation.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Month: December 2020

So You’ve Hit the Limits of ADF Concurrency

Logging Schema Changes with DDL Triggers

Compressing and Storing LOB Data

Working with Self-Hosted Integration Runtimes

Temp Tables as a Release Valve for Design Issues

Stochastic Processes in R

Logistic Regression as a Neural Network

Durable Keys in Type 2 Dimensions

dbachecks Against Azure SQL Databases

Parallel Inserts into Temp Tables