Press "Enter" to skip to content

Curated SQL Posts

Cloudera, Polybase, And Active Directory

Ajay Jagannathan shows how to integrate a SQL Server instance + Polybase with a Cloudera Hadoop cluster, all using Active Directory for accounts:

For all usernames and principals, we will use the suffixes like Cluster14 for name-scalability.

  1. Active Directory setup:
  1. Install OpenLDAP utilities (openldap-clients on RHEL/Centos) on the host of Cloudera Manager server. Install Kerberos client (krb5-workstation on RHEL/Centos) on all hosts of the cluster. This step requires internet connection in Hadoop server. If there is no internet connection in the server, you can download the rpm and install.

This is absolutely worth the read.

Comments closed

Automated Emails

Allison Tharp shows how to send automated e-mails with Powershell:

The update has two parts: how I feel about my work and how I feel about my department.  For each of these, I wrote a few ‘beginning’ sentences and a few ‘ending’ sentences.  The script picks a random beginning and ending sentence for each category (work and department), color codes it, and sends the email to my personal and my work emails.

I love the randomization.

Comments closed

Deployment Contributors

Richie Lee discusses an alternative to pre-model scripts:

According to the blurb, deployment contributors can perform custom actions when deploying a SQL script. And one such use of deployment contributors would be to alter index builds to be an online operation. Microsoft also have a Github DACExtensions repo, and this is very useful because, and in the interests of full disclosure, I have never written a deployment contributor myself. This is partly because the repo has some very good examples, including the online index issue (this post nicely covers how to make use of deployment contributors.) I know those that have and have explained how they work very well. But I think there are a few challenges w/r/t deployment contributors:

  • No one has ever heard of them

  • You have to use C#

  • They’re not entirely straightforward.

This is a good discussion of deployment contributors, including why we don’t see them more frequently.

Comments closed

Subqueries And Performance

Grant Fritchey busts a myth:

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

There are times not to use subqueries, but this post is absolutely correct:  understand the reasons why things may or may not perform well, and don’t be afraid to try things out.

Comments closed

Synchronicity

Kenneth Fisher discusses synchronous versus asynchronous in programming terms:

Synchronous – Code that runs one one line at a time. Each line of code is completed before the next one starts. If an external call is made then it is completed before the next line of code runs.

Asynchronous – Code that is launched and runs separately from the initial code. If a SQL job is launched from inside a batch of code (using sp_start_job for example) then the job is running in parallel (at the same time as) to the remainder of the batch of code.

Understanding which operations are synchronous versus asynchronous, and which operations are blocking versus non-blocking versus semi-blocking, will do wonders for improving application performance.

Comments closed

Minimal Logging With Columnstore

Niko Neugebauer continues his columnstore series by looking at columnstore insert logging in SQL Server 2016 versus 2014:

Ladies and gentlemen! That’s quite a difference to SQL Server 2014!
We better check the total length of the transaction log to see the final result: 384.032 bytes! Ok, that is significantly more than for the rowstore heap table for sure, but what about the comparison to the SQL Server 2014 ? Did this minimal logging bring any improvement ?
Well … 🙂
In SQL Server 2014 we had 1.255.224 bytes spent on the transaction log – meaning over 1.2 MB, meaning around 3 times more, for the Delta-Store insertion! For such a simple table, this is a huge improvement, but let’s take a look at the total length of the transaction log entries in both environments (SQL Server 2014 & SQL Server 2016)

This is worth a careful read.  If you’ve spent time working with 2014 clustered columnstore indexes, there are a few changes which might affect you.  The most interesting thing for me was that the deltastore is no longer page compressed.

Comments closed

Automatic Variables In Powershell

Constantine Kokkinos has a list of automatic variables in Powershell:

Been a few days of learning since I last wrote one of these, but I have come back to the automatic variables page on the PowerShell documentation enough times that I think I should just blog the important parts for myself.

  • $?TRUE/FALSE if the last thing you did succeeded.

  • $_ – Something everyone uses in posh, current pipeline object.

  • $Args – all the undeclared params passed to a function, try to avoid.

  • $Error – the array of error objects that represent a stack of the most recent errors. use $Errors0 to get the most recent error.

Read on for more variables.

Comments closed

Upgrading SSMS

Melissa Coates argues that you should upgrade to the latest version of Management Studio:

SSMS is supported for managing SQL Server 2008 through 2016 (except for SSIS instances which sadly still require a version-specific SSMS at the time of this writing). If you manage numerous servers on different versions, this unification is fantastic. There is partial support for managing pre-2008 instances. And, of course as you’d expect, the newest SSMS release supports various new features in SQL Server 2016 such as Query Statistics, Live Query Plans, Compare Showplan, Security Policies for row-level security, and so on with all the new 2016 goodies we have.

SSMS also supports managing Azure components such as Azure SQL Database and Azure SQL Data Warehouse, as well as hybrid cloud features such as StretchDB and Backup to URL. This additional unification is really, really handy.

I have a copy of SSMS 16 for reading Query Store, but not all of my plugins have been updated yet, so I’m still living in SSMS 2014 for now.

Comments closed