Press "Enter" to skip to content

Day: August 14, 2020

Covariance and Multicollinearity

Mattan Ben-Shachar gives us an intuitive understanding of multicollinearity and how it can affect an analysis:

The common and almost default approach is to fix age to a constant. This is really what our model does in the first place: the coefficient of height represents the expected change in weight while age is fixed and not allowed to vary. What constant? A natural candidate (and indeed emmeans’ default) is the mean. In our case, the mean age is 14.9 years. So the expected values produced above are for three 14.9 year olds with different heights. But is this data plausible? If I told you I saw a person who was 120cm tall, would you also assume they were 14.9 years old?

No, you would not. And that is exactly what covariance and multicollinearity mean – that some combinations of predictors are more likely than others.

I liked the explanation Mattan provides us. Also be sure to read the warnings near the end of the post around other things to try. H/T R-bloggers

Comments closed

Classification Problems and Classification Rules

John Mount warns against simply returning a class in a classification problem:

This statement is a bit of word-play which I will need to unroll a bit. However, the concrete advice is that you often get better results using models that return a continuous score for classification problems. You should make that numeric score available to downstream business logic instead of making a class choice at model prediction time. Informally the word “classifier” to informally mean “scoring procedure for classes” is not that harmful. Losing a numeric score is harmful.

Read the whole thing, as John lays out a good argument.

Comments closed

Performance Tuning SSIS Data Flows

Mark Broadbent reviews a SQLBits talk:

Yes before you say it, I know SQL Server Integration Services is “old technology” but a lot of people are still using it, and in many cases are either still developing against it, or are looking to integrate/ migrate with other burgeoning technologies such as Azure Data Factory. In other words, if you are not currently using SSIS then this post is probably not for you -otherwise read on.

If you are still one of the lucky ones to still be using SSIS, I thought it would be worth publishing these comprehensive notes taken from a session titled “SSIS Data Flow Performance Tuning” delivered at SQLBits 8 (Brighton) by the then “SSIS guru” Jamie Thomson. Notes have timings (in mins and seconds) against them, which correlate directly with the presentation times. The video is still available and can be downloaded from the SQLBits website so you can watch it (if required) and use the timings to follow along.

It’s an asynchronous watch party with Mark.

Comments closed

Splatting in Powershell

Mark Wilkinson describes splatting in Powershell and shows how you can use it to handle optional parameters:

I have to start of by saying I hate the name “splatting”. I didn’t come up with it, and I don’t like using it, but it’s the only word we have. Splatting is a way to pass parameter values to a function using a single array or hashtable. In this post we’ll be talking about hashtables because I think it is the more useful of the two.

Splatting is easy to explain in an example. 

And then that’s exactly what Mark gives us. Click through for the example as well as how you can set those optional parameters.

Comments closed

Blocking Classic Workspaces in Power BI

Adam Saxton points out something new in Power BI:

The ability to BLOCK classic workspaces from being created in Power BI is finally here! Adam shows you how to implement and what to consider. Create Microsoft Teams without the worry!

Click through for a video as well as the Power BI blog post describing this. You can also tell that Adam has the heart of a DBA based on the level of excitement around blocking something. DBAs and goalies, I tell you.

Comments closed

Overriding SSRS Authentication

Eitan Blumin doesn’t like the SSRS authentication prompt:

In this post, I hope to summarize the various methods that we have, in order to get rid of that annoying authentication prompt. Each method has its own advantages and disadvantages in terms of complexity of implementation, versatility, and the level of security that it provides. More specifically: the more secure and versatile a method is – the more complicated it is to implement.

Read on for four such techniques, as well as a bonus technique.

Comments closed