Press "Enter" to skip to content

Author: Kevin Feasel

Measuring Correlation In SQL

Phil Factor shows how to calculate Kendall’s Tau and Spearman’s Rho in SQL:

Kendall’s Tau rank correlation is a handy way of determining how correlated two variables are, and whether this is more than chance. If you just want a measure of the correlation then you don’t have to assume very much about the distribution of the variables. Kendall’s Tau is popular with calculating correlations with non-parametric data. Spearman’s Rho is possibly more popular for the purpose, but Kendall’s tau has a distribution with better statistical properties (the sample estimate is close to a population variance) so confidence levels are more reliable, but in general, Kendall’s tau and Spearman’s rank correlation coefficient are very similar. The obvious difference between them is that, for the standard method of calculation,  Spearman’s Rank correlation required ranked data as input, whereas the algorithm to calculate Kendall’s Tau does this for you.  Kendall’s Tau consumes any non-parametric data with equal relish.

Kendall’s Tau is easy to calculate on paper, and makes intuitive sense. It deals with the probabilities of observing the agreeable (concordant) and non-agreeable (discordant) pairs of rankings. All observations are paired with each of the others, A concordant pair is one whose members of one observation are both larger than their respective members of the other paired observation, whereas discordant pairs have numbers that differ in opposite directions. Kendall’s Tau-b takes tied rankings into account.

I appreciate Phil putting this series together.  I’d probably stick with R, but it’s good to have options.

Comments closed

Rethrowing Exceptions

Vladimir Oselsky shows how to use THROW and RAISERROR for rethrowing exceptions:

Upon executing the first procedure, we get the error message back to the front end, but after checking balance, we find that money withdrawn from the account, but in the case of the second procedure, the same error returned to the front end but money still there.

Now we begin to scratch our head trying to figure out why we lost the money even though we got errors in both cases. The truth behind is the fact that RAISERROR does not stop the execution of code if it is outside of TRY CATCH block. To get same behavior out of RAISERROR, we would need to rewrite procedure to look something like following example.

There are some nuanced differences between THROW and RAISERROR, so it’s valuable to know how both work.

Comments closed

Basics Of R Plotting

Aman Tsegai shows some basic ways to customize R’s plot function:

We’re going to be using the cars dataset that is built in R. To follow along with real code, here’s an interactive R Notebook. Feel free to copy it and play around with the code as you read along.

So if we were to simply plot the dataset using just the data as the only parameter, it’d look like this:

plot(dataset)

The plot function is great for cases where you don’t much care how the visual looks, and the simplicity is great for throwaway visuals.

Comments closed

Thinking About Automation

Chrissy LeMaire has a series of thoughts on this month’s T-SQL Tuesday, and it was worth separating out from the rest of today’s batch:

Y’all know what I’m gonna say here! I love automation and PowerShell. I know for a fact that PowerShell and T-SQL together are the future of SQL Server administration. As someone who often presents about dbatools, the popular SQL PowerShell community project, I’ve seen the excitement and relief that PowerShell automation brings to SQL Server Database Pros.

From making it way easier to migrate entire instances to automating backup testing and verification, PowerShell makes it straight up more enjoyable to be a DBA.

There’s a lot of well-deserved plugging of dbatools.  Hint, hint.

Comments closed

Updating Large Tables In SQL Server And Oracle

Jana Sattainathan has a post on how he was able to move and update billions of rows, using both Oracle and SQL Server as examples:

The key thing to remember with SQL Server is to convert to a non-integer value by using a “decimal” as shown in the above example with “10.”. This is the same as saying “10.0”. Without the “.”, it will result in uneven splits from rounding errors of integers. It is not the result that you intend to have it you want accurate results.

To show you the difference, I have included the SQL and results of a query that uses “.” and the other that does not, with “.” being the only difference:

It’s a good article, and definitely an important thing to think about when you have large tables.

Comments closed

Power BI Quick Measures

Paul Turley has a post on the new “Quick measures” functionality in Power BI:

I had added the new Quick Measures feature to Power BI Desktop in the Options/Preview page.  This, apparently disables Quick Calcs and enables Quick Measures.  Although it flustered me me for a minute in front of an audience, I found this to be welcome news.  So, what is Quick Measures?  It’s a DAX calculation generator that automatically writes useful measures.  Here’s how it works…  Start by right-clicking or clicking on the ellipsis for a numeric column in the Field list and choose Quick measure…

The Quick measures dialog prompts for the necessary fields, which might be different for each calculation.  The tool generates appropriately formatted DAX calculations.  It even includes conditions to raise errors if used in the wrong context.  There are currently 19 different calculation variations that the tool will generate.  Following are two examples.  Creating a Quick measure from my [Flights] measure and choosing the Airline field for categorization produces this calculation:

Looks to be interesting.  Read the whole thing.

Comments closed

Moving Files In Azure Data Factory

Meagan Longoria has a workaround for how you cannot move a file using Azure Data Factory:

But at this time ADF doesn’t support that. You can copy a file with a copy activity, but you cannot actually move (i.e., copy and delete).

Luckily, we had a workaround for our situation. If you tell ADF to copy data to a file that already exists in the specified location in the data lake, it will overwrite the existing file. We made sure the file name is always the same for each table in the staging area so there is always only one file per table.

Read on for the full details on this workaround.  Also, vote on this feedback item if you want the ability to move files instead of just copying them.

Comments closed

Transactional Replication Procedures

Drew Furgiuele offers up warnings when thinking about rolling your own transactional replication stored procedures:

In the above picture, we can see that it did replicate the execute statement, and that it affected 19,972 rows on the replica, and it only took 67ms! Sounds awesome, doesn’t it? Here’s a way to handle large batch updates at your publishers without overwhelming your replication setup. But before you go changing everything, you should probably understand that this has some really, really bad side effects if you’re not careful. Let’s look at three really big ones.

All in all, it’s a fairly risky move but might be worth the performance improvements.

Comments closed

Sharing Power BI Data

Steve Hughes is starting a series on Power BI security:

Another way to compartmentalize or secure data is using Workspaces within Power BI. Every user, including free users, have access to My Workspace which is the default location for deploying Power BI and other BI assets. However, you also have the option to create additional workspaces as deployment targets. These Group Workspaces usually have functional and security separation associated with them.

This post is a good overview of methods available for data sharing.

Comments closed

Sundry Thoughts On Change

Here are a few takes on the most recent T-SQL Tuesday.

Dave Mason is feeling overwhelmed:

The 2-year release cycle has been tough for some of us. Other outside forces have compounded the burden. DBAs have had to learn about virtualization and cloud computing. We’ve had to dip our toes in the No-SQL pool, and embrace automation like never before. Soon, if not already, we’ll be working with containers and supporting SQL Server on Linux. Yeah, it’s trite to talk about how “change is a constant”. (Is there anyone unaware of this?) But most seem to agree that the traditional role of the DBA is undergoing a drastic transformation. Others predict it will be completely unrecognizable, if not extinct, in a few years. What’s a DBA to do? Double down on SQL Server and stay the course? Or branch out to a different field like analytics, BI, or data science?

Riley Major says to use your noggin:

This makes sense. In business, you don’t want to be viewed as a cost center. You want to be on the revenue side of the equation. Whether IT is a competitive advantage or just plumbing depends on how it’s being used. If you’re just keeping the lights on, then you may be as critically important as the electricity itself, but you’re a commodity which can be replaced with a cheaper option. On the other hand, if you are providing insight which directs the company to profits, or if you are developing features which grow market share, your value is obvious.

So if you’re on the administration side of IT, you’re naturally more vulnerable in the eyes of the company. You make things possible, but you don’t actually do the things. You have to bring something unique to the table so that you can’t be as easily replaced with a service.

Kenneth Fisher says this is more of the same:

Unfortunately as powerful as these machines became they were expensive, aged out quickly, required knowledgeable people to maintain and sometimes our tasks required more computing power than we had on hand. So some smart people got together and created something new. The Cloud. Someone else maintaining the computers, replacing parts as needed, updating software etc. And then renting out storage and computing power. (If at this point you guessed that I’m saying there are some fairly obvious parallels between the old mainframes and the cloud, well, you are correct.)

Andy Galbraith ties this back to April Fools jokes re: SQL on Linux:

I quietly ignored it and went about my life and job, putting off the problem until later.
Time passed and Microsoft released a “public preview”/CTP of what they began calling “SQL Server vNext” for Linux, and it became more real.  Then they released another, and another – as of this writing the current CTP is version 1.4 (download it here).
I recently realized I hadn’t progressed past my original query:
WHAT DO I DO KNOW?

John Morehouse has a bat:

I work for a fairly slow moving financial institution.  This does not me we don’t adopt new technology but the leadership is very careful when deciding to move in a certain direction. Since we service rural America farmers, these decisions could have a huge impact on the ability of our customers to operate.    The cloud, at least from a database perspective, is not something that I think is even on the radar.  I believe that we will get there eventually, but not in the next year or two I would imagine.

Of course, this also means that I don’t get the shiny new cloud toys to play with either.  I have had the ability to work with the cloud some years ago on a side project, but that was very limited.  It was also at a time where Azure was fairly young and not as robust as it is today. Learning new skills around the Cloud is on my to-do list and one of these days I’ll get to it.  I think with the help of MSDN, it’s a lot easier to play around with new technologies.

There are a lot of good posts on this topic this month.

Comments closed