Press "Enter" to skip to content

Day: July 16, 2020

Contrasting ANOVA against Regression

Stephanie Glen contrasts ANOVA against typical regression techniques using a picture:

If you scour the internet for “ANOVA vs Regression”, you might be confused by the results. Are they the same? Or aren’t they? The answer is that they can be the same procedure, if you set them up to be that way. But there are differences between the two methods. This one picture sums up those differences.

Click through to see that image.

Comments closed

Survival Analysis in Spark

Rab Saker and Bryan Smith hit on a topic close to my heart:

These patterns seem to indicate that KKBox could actually differentiate between customers based on their lifetime potential using information known at the time of acquisition. This information might help inform or steer specific discounts or promotions to customers as they register for a trial. This information might also inform KKBox of which offerings or capabilities to discontinue as some, e.g. Initial Payment Method 35 or the 7-day payment plan as shown in Figure 3, align with exceptionally high churn rates in the first 30-days with little long-term survivorship.

Of course, there are relationships between these factors so that we should be careful in viewing them in isolation. By deriving a baseline risk (hazard) of customer churn (Figure 4), we can calculate the influence of different factors on the baseline in such a manner that each factor may be considered an independent hazard multiplier.  When combined (through simple multiplication) against the baseline, we can plot the a specific customer’s chances of abandoning a subscription by a given point in time (Table 1).

Click through for the story as well as a set of notebooks.

Comments closed

Minimum Permissions Required for Get-DbaDbUser

Shane O’Neill walks us through wants to figure out minimum permissions required for the Get-DbaDbUser cmdlet in dbatools:

I’m not going to sugarcoat things – the person that sent me the request has more access than they rightly need. The “public” access worker did not need any of that access so I wasn’t going to just give her the same level.

Plus, we’re supposed to be a workforce that has embraced the DevOps spirit and DevOps is nothing if it doesn’t include Security in it.

So, if I could find a way to give the user enough permission to run the command and not a lot more, then the happier I would be.

Shane takes us through the process so we don’t have to.

Comments closed

A Warning on Power BI Custom Visuals

Martin Schoombee gives us a warning around relying upon free custom visuals:

Before you get the impression that I’m against custom visuals, let me say this: I love custom visuals! I myself have used many custom visuals in the past and have been very quick to look for a custom visual when I couldn’t get something to display or work the way I needed it to in Power BI.

Custom visuals fill an important gap where the base product is not yet where it needs to be, and what better way for Microsoft to see what people need and where they need to invest more time from a visualization standpoint? It’s an awesome concept and I like it.

Unfortunately there are a few BUT’s to follow, but let me first tell you my story…

Read the whole thing. I like custom visuals a lot, but there are risks in a corporate world, and I don’t necessarily mean security.

Comments closed

Creating a Power BI Streaming Dataset

Rob Farley takes us through the process of creating and using a Power BI streaming dataset:

Real-time Power BI sets are a really useful feature, and there’s a good description of them at https://docs.microsoft.com/en-us/power-bi/connect-data/service-real-time-streaming. I thought I’d do a quick walkthrough specifically around the Push side, and show you – including the odd gotcha that you might not have noticed.

To create a dataset that you want to push data into, you need to go to the Power BI service, go to your Workspace, and create a Streaming dataset. Even if you’re not wanting to use it with a streaming service, this is the one you need.

Rob has plenty of animated GIFs to walk you through the process, as well as a couple of caveats if you want to play along at home.

Comments closed

When Batch Mode on Rowstore Hurts Performance

Erik Darling walks us through a scenario where batch mode on rowstore can make performance of a query worse:

I’m not mad at 2019 or Batch Mode On Rowstore (BMOR) or anything.

But if I’m gonna get into it, I’m gonna document issues I run into so that hopefully they help you out, too.

One thing I ran into recently was where BMOR kicked in for a query and made it slow down.

Click through for the scenario, why it’s slower when using batch mode, and two ways you can improve the query.

Comments closed

Learning About Index Utilization with dbatools

Ben Miller takes us through a way to know your data:

You have many tables in your databases and you want to know how they are used. There are DMVs for index usage stats which will tell you about like sys.dm_db_index_usage_stats and querying them is insightful, but how do the stats change over time? These stats are reset when the instance is restarted and it is good to know that you have 2000 seeks and 500 scans of the index, but when did they happen? Was it on a common day? Common hour?

Ben has a way to help you figure that out.

Comments closed