Press "Enter" to skip to content

Curated SQL Posts

Microsoft R Client

Buck Woody discusses whether Microsoft R Client really is a client:

Enter the Microsoft R Client. It includes Microsoft R Open, and adds in some of the ScaleR functions, which makes processing data faster and more efficient. And again, it’s a full R environment – you can write and run code, right there on your desktop. But the important bit is that it can connect to a Microsoft R Server (MRS) by seting something called the “Compute Context“, which tells the R environment to run on a more powerful, scalable server environment, like you may be used to with SQL Server.

The naming is a bit of a head-scratcher, to be honest.

Comments closed

Notifying Different Sets Of Operators

Jason Clements has an interesting solution to the problem of user notification:

The other day I was asked How to notify multiple operators using database mail for failed jobs and a different operators for successes.

First I looked at the operator email addresses field [email protected];[email protected]….etc is not helping as there is a limit on the characters in the email name entry of operator and we still have the problem we need different groups for success and failures.

It makes perfect sense, but is non-trivial.  I like it.

Comments closed

Licensing Hyper-Converged Architectures

David Klee discusses SQL Server licensing on a unique (but growing in popularity) hardware architecture:

Now, look at the hyper-converged architectures. The CPUs used for I/O handling are the same as those that your VMs use to power your applications. A substantial portion of the host’s CPU power is now needed for I/O handling, and this activity comes first in the CPU scheduling queues.

This fact, by itself, is not necessarily a problem. Most virtualization host CPUs are relatively lightly utilized, and this amount of CPU needed for I/O handling is readily absorbed without causing a performance problem.

But, larger scale SQL Servers can read and write exceptionally large amounts of data around the clock. The I/O handling at the host layer can start to drain resources from the host. The additional activity scheduling time inside the hypervisor could be slowing down these SQL Servers without you knowing it.

As always, work out how your licensing would look under different alternatives or you could end up wasting a pretty penny.

Comments closed

Alternatives To Invoke-SQLCmd

Drew Furgiuele explains why he prefers not to use Invoke-SQLCmd and what he does instead:

I hope you’re wide awake and ready to get mad, because I have a very unpopular opinion regarding SQL Server and PowerShell: I don’t like using Invoke-SQLCmd . That’s not to say I haven’t used it; some of my older scripts on my GitHub use it but recently I have stopped. I think the cmdlet is of marginal use when compared to other methods to executing queries against a SQL Server database. Maybe you agree, but you probably don’t, so before you grab the torches and pitchforks and head to the comments, I encourage you to read on. It’s a little long, but I think it outlays my thoughts pretty well.

I think Drew makes a good point.  Read the whole thing.

Comments closed

Killing SSMS Using Powershell

Michael Bourgon shows how to kill an instance of SSMS using Powershell:

WMI in action!  I was running SSMS 2016 and it locked up on me.  So while I waited for it to become responsive I started up SSMS 2014. Still locked, but which do I kill in taskman? (The easy answer, is, of course: change the open query and then look in Applications, not Processes). I didn’t think of that, so used WMI.  You could also get fancy and figure out which is the oldest instance of SSMS and do it that way.

Pretty simple: Tell it the path (which is 130 for SSMS 2016), and kill that.

Click through for the script.  You can do more than just kill processes this way.  Think of a scenario in which you create a whitelist of valid processes and regularly check to see if anything outside that whitelist is running.  It’s a little more difficult to do than this script, but not that much tougher.

Comments closed

Cluster Shared Volumes

Edwin Sarmiento discusses Cluster Shared Volumes within Failover Cluster Instances:

CSVs still use the concept of a shared disk but with an added layer of abstraction. Instead of having the shared disk be accessible to only one WSFC node at a time, all of the WSFC nodes have their own logical paths to it. This is made possible thru the Server Message Block (SMB) protocol. Since the WSFC nodes are connected to each other thru the heartbeat network (it’s another reason to have a dedicated network for inter-node communication,) CSV can take advantage of this route to the shared storage to send I/O commands.

Don’t be confused. While all of the WSFC nodes have both physical and logical paths to the shared disk, only the node that owns the SQL Server clustered resource can own it, just like when using traditional shared disks. In other words, the owner node dictates how to send the I/O commands – either directly thru its own access path or thru the heartbeat network. This is why you won’t see any direct dependency on the shared disk if you look at the Dependency Report. It also means that the failover process will be faster with CSVs.

If you’re using SQL Server 2014 or later and have built a cluster, check this feature out and see if it fits in your environment.

Comments closed

Approximation Or Classification?

A blog post on the Algolytics blog discusses different approximation and classification models and when to use each:

Even if your target variable is a numeric one, sometimes it’s better to use classification methods instead of approximation ones. For instance if you have mostly zero target values and just a few non-zero values. Change the latter to 1, in this case you’ll have two categories: 1 (positive value of your target variable ) and 0. You can also split numerical variable into multiple subgroups : apartment prices for low, medium and high by equal subset width and predict them using classification algorithms. This process is called discretization.

Both types of models are common in machine learning, so a good understanding of when to use which is important.

Comments closed

Handling Delimited Strings

Ed Pollack looks at several methods of creating delimited strings, and then several methods of breaking out delimited strings:

The execution plan is cut off, but you can be assured that there are six more similar plans below the ones pictured here. These metrics are misleading as each loop doesn’t seem too bad, right? Just 9% of the subtree cost or a few hundred reads doesn’t seem too wild, but add up all of these costs and it becomes clear that this won’t scale. What if we had thousands of rows to iterate through? For 5,000 rows, we would be looking at about 147,995,000 reads! Not to mention a very, very long execution plan that is certain to make Management Studio crawl as it renders five thousand execution plans.

Alternatively, we could cache all of the data in a temp table first, and then pull it row-by-row. This would result in significantly less reads on the underlying sales data, outperforming cursors by a mile, but would still involve iterating through the temp table over and over. For the scenario of 5,000 rows, we’d still have an inefficient slog through a smaller data set, rather than crawling through lots of data. Regardless of method, it’s still navigating quicksand either way, with varying amounts of quicksand.

This is a good article and includes STRING_SPLIT and OPENJSON methods in SQL Server 2016, but does not include building your own CLR methods.  Check it out if you need to do any delimited string manipulation.

Comments closed

Writing Good Tests In R

Brian Rowe discusses testing strategy in R:

It’s not uncommon for tests to be written at the get-go and then forgotten about. Remember that as code changes or incorrect behavior is found, new tests need to be written or existing tests need to be modified. Possibly worse than having no tests is having a bunch of tests spitting out false positives. This is because humans are prone to habituation and desensitization. It’s easy to become habituated to false positives to the point where we no longer pay attention to them.

Temporarily disabling tests may be acceptable in the short term. A more strategic solution is to optimize your test writing. The easier it is to create and modify tests, the more likely they will be correct and continue to provide value. For my testing, I generally write code to automate a lot of wiring to verify results programmatically.

I started this article with almost no idea how to test R code.  I still don’t…but this article does help.  I recommend reading it if you want to write production-quality R code.

Comments closed

SSAS And The Monthly Release Cycle

Chris Webb shares some thoughts on what the monthly SSDT/SSMS release cycle could mean for Analysis Services:

While getting bugs fixed quickly is great, I really hope that this new focus on tooling means that we get new BI-related features in SSMS and SSDT. SSMS has, in particular, been shockingly bad at supporting BI developers: for example, I find it unbelievable that we still don’t have a DAX query window in SSMS when the language has been in SSAS Tabular since SQL Server 2012 (I know the community have filled the gap with DAX Studio, but that’s not the point). Aaron Nelson recently announced a Trello board where ideas for new features in SSMS can be debated but there’s very little BI-related stuff there at the time of writing.

One can hope.  The big problem is getting enough support to bump items up on the backlog, and for that, there’s a Trello board.

Comments closed