November 2020 – Page 2

The top left of this execution plan looks very unsurprising. A Clustered Index Scan to read all rows from the #Products temporary table, and those rows are then passed to a Clustered Index Insert operator for insertion into our permanent table dbo.Products. But wait? Why is there no table or index name listed below the operator name? That’s actually because the operator has not one but three items in its Object property. When a Clustered Index Insert targets just a single index, SSMS can work out the name and show it. But when Clustered Index Insert targets multiple objects, SSMS plays it safe and displays none.

Read on to learn more.

Comments closed

Thinking about Virtual Log Files

Published 2020-11-25 by Kevin Feasel

Julie Behrens, via Kevin Hill, covers the concept of virtual log files:

It is especially evident there is an issue with VLFs when SQL Server takes a long time to recover from a restart. Other symptoms may be slowness with autogrowth, log shipping, replication, and general transactional slowness. Anything that touches the log file, in other words.

Read on to understand how to figure out if you have a problem with virtual log file counts and a resolution.

Comments closed

Identifying Straggler Tasks in Spark Applications

Published 2020-11-25 by Kevin Feasel

Ajay Gupta clues us in on a process:

What Is a Straggler in a Spark Application?
A straggler refers to a very very slow executing Task belonging to a particular stage of a Spark application (Every stage in Spark is composed of one or more Tasks, each one computing a single partition out of the total partitions designated for the stage). A straggler Task takes an exceptionally high time for completion as compared to the median or average time taken by other tasks belonging to the same stage. There could be multiple stragglers in a Spark Job being present either in the same stage or across multiple stages.

Read on to understand the consequences and causes of these straggler tasks, as well as what you can do about them.

Comments closed

Query Store Checks in dbatools

Published 2020-11-25 by Kevin Feasel

Jess Pomfret contributes to dbatools:

Once I was happy with my settings, I realised we were missing a ‘test’ command for dbatools. The suite of ‘test’ functions in dbatools (a lot that end up as checks in dbachecks btw!), give us an easy way to check our environment against best practices, or our desired settings.
Since dbatools is open-source I was able to write this function (Test-DbaDbQueryStore) and get it added into the module. It’s included as of version 1.0.131, so make sure you’re up to date. Taking Erin’s suggestions and wrapping them in a little PowerShell, I can make it easier for myself and everyone else to make sure we’re following her guidelines.

Click through to see what those settings look like and how you can compare against current settings.

Comments closed

Creating Sequence Diagrams for SSIS Packages

Published 2020-11-25 by Kevin Feasel

Aveek Das has an idea for documentation:

In this article, I am going to explain in detail how to document SSIS packages using Sequence Diagrams and the importance of these diagrams in the field of software engineering, no matter which programming language are you using. In my previous article, I have talked about the various UML Diagrams that are being used to document various software engineering processes. Also, I have talked about modular ETL architecture and how to create such a modular package in SSIS. Sequence diagrams are also a part of the broader UML Diagrams which define the interaction between the various components in the system in a chronological manner.

My gut feeling is that this works best with medium-sized collections of packages, where we’re talking 10-30 or so packages in total, and that for something much larger, I’d want an automated tool to build diagrams for me. But I could be way off base on that.

Comments closed

Load Balanced Services and Ingress in Kubernetes

Published 2020-11-25 by Kevin Feasel

Andrew Pruski contrasts techniques:

What is the difference between using a load balanced service and an ingress to access applications in Kubernetes?
Basically, they achieve the same thing. Being able to access an application that’s running in Kubernetes from outside of the cluster, but there are differences!

Read on to learn about the key differences between these two.

Comments closed

Contingent Power BI Dataset Refreshes

Published 2020-11-25 by Kevin Feasel

Chris Webb has an interesting problem to solve:

This week a customer came to me with the following problem: they had scheduled the refresh of their dataset but their source data wasn’t always ready in time, so the old data was being loaded by mistake. The best solution here is to use some kind of external service (for example Power Automate) to poll the data source regularly to see if it’s ready, and then to refresh the dataset via the Power BI REST API when it is. However, it got me thinking about a different way of tackling this: is it possible to write some M code that will do the same thing? It turns out that it is, but it’s quite complicated – so I don’t recommend you use the code below in the real world. Nevertheless I wanted to write up the solution I came up with because it’s interesting and you never know, it might be useful one day.

Read on for the less-than-optimal solution, but do check out the better solution Chris describes.

Comments closed

Global Parameters in SSIS Framework

Published 2020-11-25 by Kevin Feasel

Andy Leonard has an update for us:

I’m happy to announce the latest version of our SSIS Framework includes global parameters! I can hear some of you thinking, …
“What Are Global Parameters, Andy?”
I’m so glad you asked! SSIS ships with package-scoped and project-scoped parameters. Project-scoped parameters may be used in any SSIS package in the project; package-scoped parameters are only available within the context of a single SSIS package. This functionality reduces repetition in SSIS package development and execution configuration.
Global parameters allow our SSIS Framework customers to set parameters and values that apply to the entire SSIS Catalog.

Now that you know what they are, Andy has an example of them in action. Global parameters aren’t part of the community edition, but they do look interesting.

Comments closed

Defining Simple Regression

Published 2020-11-24 by Kevin Feasel

Vincent Granville has a new algorithm for us:

The model-free, data-driven technique discussed here is so basic that it can easily be implemented in Excel, and we actually provide an Excel implementation. It is surprising that this technique does not pre-date standard linear regression, and is rarely if ever used by statisticians and data scientists. It is related to kriging and nearest neighbor interpolation, and apparently first mentioned in 1965 by Harvard scientists working on GIS (geographic information systems). It was referred back then as Shepard’s method or inverse distance weighting, and used for multivariate interpolation on non-regular grids (see here and here). We call this technique simple regression.

Read on to learn more about simple regression, including how to implement it and how it performs.

Comments closed

Automating Database Deployments: Why Not?

Published 2020-11-24 by Kevin Feasel

Grant Fritchey asks a question:

Building out processes and mechanisms for automated code deployments and testing can be quite a lot of work and isn’t easy. Now, try the same thing with data, and the challenges just shot through the roof. Anything from the simple fact that you must maintain the persistence of the data to data size to up time, and you have real problems in front of you.
However, adopting database deployment automation and testing has enormous benefits. Faster, safer, production deployment enhances the protection built around your production systems. Whether we want to use the loaded term of DevOps or not, the benefits of this style of development and deployment are easily documented and measured.
So, why are so few people doing it?

Grant gives some of the outline and lays out one response. I am seeing a lot more automation over time, but one underappreciated facet in this is a lack of trust for automated processes from humans. I think a good percentage of DBAs don’t trust that the automated process will get things correct, especially when dealing with complex chains of dependencies. An automated process may be less likely to make a mistake in a step, but it will also be unable to reason through an ambiguity and could perform an undesirable action in the event of unexpected circumstances. That’s a pretty big risk for DBAs who are concerned about their data. I can see a few other reasons as well, but this is one which I don’t hear often enough in these discussions.

Also, Grant asks people to fill out the State of Database DevOps survey, especially those people who are not automating database deployments.

1 Comment

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Month: November 2020

How Foreign Keys Appear in Execution Plans

Thinking about Virtual Log Files

Identifying Straggler Tasks in Spark Applications

Query Store Checks in dbatools

Creating Sequence Diagrams for SSIS Packages

Load Balanced Services and Ingress in Kubernetes

Contingent Power BI Dataset Refreshes

Global Parameters in SSIS Framework

Defining Simple Regression

Automating Database Deployments: Why Not?