Press "Enter" to skip to content

Curated SQL Posts

Causing tempdb Spills

Kendra Little shows us a quick and easy way to cause tempdb spills:

Sometimes it’s useful to know how to cause a problem.

Maybe you’ve never encountered the problem, and want to get hands-on experience. Maybe you’re testing a monitoring tool, and want to see if a condition flags an alert. Maybe you’re testing out a new client tool, and want to see how it displays it.

I recently was going through some demos in SQL Operations Studio, and I found that a spill on a sort operator wasn’t causing a warning to visibly show in the graphic execution plan.

I wanted to file an issue on this and let the Ops Studio team know that would be helpful – but my demo code was somewhat complex and required restoring a rather large database. So I set up a quick code sample to cause a spill that could be run in any database.

It’s important to know how to cause problems if you want to make sure you’ve solved them correctly.

Comments closed

Ambari Architecture

The folks at Data Flair have a tutorial on how Ambari is architected:

Ambari Architecture is of master/slave type architecture. So, to perform certain actions and report back the state of every action, the master node instructs the slave nodes. Although, for keeping track of the state of the infrastructure, the master node is responsible. But for this process, a database server is used by the master node, that can be further configured during setup time.

Now, we can see the high-level architecture of Ambari by below diagram which also shows how Ambari works:

Ambari is one of the easiest ways I’ve seen to spin up and manage a Hadoop cluster.

Comments closed

Matrix Math In R

Dave Mason continues his series on matrices in R:

Math operations between matrices is possible too. Here, the same matrix is added to itself. Since it’s the same matrix, they obviously have the same number of elements. The first element is added to the first element, the second element is added to the second element, etc.

> #Add two matrices.
> some_numbers + some_numbers [,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 4 6 8 10 12
[2,] 14 16 18 20 22 24
[3,] 26 28 30 32 34 36
[4,] 38 40 42 44 46 48

This follows from Dave’s prior posts, but you can see some of the pieces start to fit together.

Comments closed

Kerberos And SQL Server

Kathi Kellenberger digs into Kerberos:

2. Why is Kerberos needed for SQL Server?

When NTLM is used, the client, for example a user logged into a laptop, contacts a domain controller when requesting access to a resource in the network. This resource could be an SSRS report, for example. When using NTLM, the user proves their identity to the SSRS server. Unfortunately, the SSRS server cannot forward the credentials of the user along to the database server. The database server will deny the request, and the end user will see an error message. This is common with SSRS but will also be seen whenever resources are needed involving multiple servers.

When Kerberos is property configured, the SSRS server can pass along confirmation of the identity of the requester to the database server via the ticket. If the login of the original requester has permission to select the data, it’s returned to the SSRS server, and the report is delivered.

Even if you are not using SSRS, you can run into issues when Kerberos is not configured properly. For example, you will often see error messages when trying to connect to SQL Server using SSMS (SQL Server Management Studio) when logged into another server when SPNs are misconfigured.

Having a good understanding of Kerberos can save you configuration headaches when going between servers.

Comments closed

NFL Player Stats In Power BI

Dustin Ryan shares his NFL player stats and analysis Power BI desktop file:

I’ve had a lot of people ask me for this over the past few months and its finally (mostly) ready! There are still a few things I’d like to do with the data models and reports but I wanted to go ahead and get the content shared out since I know many people use this for the Fantasy Football drafts which generally happen during the third week of the NFL preseason.

So here it is. I’ve spent a decent amount of time scraping the data from a few different websites in order to put something together I thought would be useful and fun, so please take a look and enjoy it!

Click through for the file and a YouTube video with more info.

Comments closed

Moving Data Between Data Lakes

Jeffrey Verheul shows us how to use AdlCopy to migrate data from one Azure Data Lake to another:

Migrating data from one Data Lake to the other
We started out with a test version of a Data Lake, and this week I needed to migrate data to the production version of our Data Lake. After a lot of trial and error I couldn’t find a good way to migrate data. In the end I found a tool called AdlCopy. This is a command-line tool that copies files for you. Let me show you how easy it is.

Download & Install
AdlCopy needs to be installed on your machine. You can find the download here. By default the tool will install the files in “C:\Users\\Documents\AdlCopy\”, but this can be changed in the setup wizard.

Click through to see how to use this tool.

Comments closed

Creating Timelines With dbatools

Marcin Gminski shows how to pull SQL Agent and backup history out of SQL Server and display it as a visual history timeline:

Currently, the output from the following commands is supported:

  • Get-DbaAgentJobHistory
  • Get-DbaBackupHistory

You will run the above commands as you would normally do but pipe the output to ConvertTo-DbaTimeline, the same way as you would with any other ConverTo-* PowerShell function. The output is a string that most of the time you will save as file using the Out-File command in order to open it in a browser.

Then, with the ConvertTo-DbaTimeline cmdlet, you can convert that into an HTML page which looks pretty good.

Comments closed

The Basics Of DAX

Matthew Brice walks us through filters and calculations in DAX:

CALCULATE is somewhat unique in that it evaluates the 2nd, 3rd, …nth parameter first, and evaluates the first parameter last using values from my Filter Context Box. I think it is extremely helpful to list briefly the steps CALCULATE performs whenever it is invoked. (So maybe we are not at 10,000 feet, but 5,000?)

The CALCULATE function performs the following operations:

  1. Create a new filter context by cloning the existing one. (***Important visual step!***)

  2. Move rows in the row context to the new clone filter context box one by one replacing filters if it references the same column. (We will ignore this step for this post)

  3. Evaluate each filter argument to CALCULATE in the old filter context and then add column filters to the new clone filter context box one by one, replacing column filters if it references the same column.

  4. Evaluate the first argument in the newly constructed filter context.

  5. Destroy this newly created, cloned filter context box before moving on to calculating the next “cell.”

If you’re interested in getting started with DAX, this is a good place to begin.

Comments closed

Dealing With Multicollinearity With R

Chaitanya Sagar explains the concept of multicollinearity in linear regressions and how we can mitigate this issue in R:

Perfect multicollinearity occurs when one independent variable is an exact linear combination of other variables. For example, you already have X and Y as independent variables and you add another variable, Z = a*X + b*Y, to the set of independent variables. Now, this new variable, Z, does not add any significant or different value than provided by X or Y. The model can adjust itself to set the parameters that this combination is taken care of while determining the coefficients.

Multicollinearity may arise from several factors. Inclusion or incorrect use of dummy variables in the system may lead to multicollinearity. The other reason could be the usage of derived variables, i.e., one variable is computed from other variables in the system. This is similar to the example we took at the beginning of the article. The other reason could be taking variables which are similar in nature or which provide similar information or the variables which have very high correlation among each other.

Multicollinearity can make regression analysis trickier, and it’s worth knowing about.  H/T R-bloggers.

Comments closed

When Cassandra Makes Sense

Anmol Sarna explains the pros and cons of using Apache Cassandra:

But as we know nothing is perfect. So is the Cassandra Database. What I mean by this is that you cannot have a perfect package. If you wish for one brilliant feature then you might have to compromise on the other features. In today’s blog, we will be going through some of the benefits of selecting Cassandra as your database as well as the problems/drawbacks that one might face if he/she chooses Cassandra for his/her application.
I have also written some blogs earlier which you can go through for reference if you want to know What Cassandra isHow to set it up and how it performs its Reads and Writes.

The only question we have is that should we or should we not pick Cassandra over the other databases that are available. So let’s start by having a quick look at when to use the Cassandra Database. This will give a clear picture to all those who are confused in decided whether to give Cassandra a try or not.

This is a level-headed analysis of Cassandra, so check it out.

Comments closed