Press "Enter" to skip to content

Curated SQL Posts

Testing Analysis Services Cubes

Jens Vestergaard shows how to test Analysis Services cubes using a Visual Studio test project:

Unit testing in Visual Studio is actually not that hard and can save you a lot pain down the road. The testing framework in Visual Studio offers extensive ways of executing batches of tests. You can group tests by Class, Duration, Outcome, Trait or Project.

When you right-click a test, you get the option to select how you want the tests in the Test Explorer to be grouped.

If you have an Analysis Services cube, definitely read this—testing is a vital part of software development, and automating tests can save you significant time later.

Comments closed

Memory-Optimized Columnstore

Niko Neugebauer clears the air regarding memory-optimized columnstore tables:

I would like to dedicate this blog post to the Memory-Optimised (also known and LOVED as Hekaton) Columnstore Indexes and their limitations in SQL Server 2016.
Disclaimer: the Memory-Optimised Technology is the ground-breaking development, which will be truly appreciated only in the next couple of years, and it has its incredible use cases (and maybe I will be blogging more about this space in the next couple of months), but people needs to understand that mapping InMemory Columnstore Indexes to disk-based Columnstore Indexes 1:1 is a very wrong idea, and that because InMemory technology is significantly younger and less stable than Columnstore Indexes – there are some very significant hidden cornerstones.

It’s important to read this post as “this is not yet a fully-mature product” rather than “this will always be worse.”  But it’s just as important to understand the limitations of the product and not think you’re getting something that you aren’t.

Comments closed

Analysis Services In Azure

Chris Webb looks at SSAS in Azure:

Support for multidimensional models will be considered for a future release, based on customer demand.

I’m pretty sure there there will be plenty of demand for Multidimensional support given the installed base that’s out there.

I hope so.  Lack of multidimensional isn’t a deal-killer, but it’s a deal-harmer.

Comments closed

Growing Speakers

Andy Yun wants to plant speaker seeds:

This month’s topic is going to be about Speaking & Presenting with a focus on Helping New Speakers! 4 short years ago, I attended my very first PASS Summit and never did I think I’d ever dare to become a Speaker and present. But a year later, I got coerced into a lightning talk. Since then, I’ve presented at several dozen User Groups & SQL Saturdays. Tomorrow, I have the honor of presenting at PASS Summit 2016! And what an adventure it’s been!

For T-SQL Tuesday, I am giving differing topics if you are currently a Speaker or have never have spoken. And if you’ve never spoken, this T-SQL Tuesday comes with a challenge and a twist.

I think this is a wonderful idea.

Comments closed

Debugging Spark Code

Vida Ha has an article on troubleshooting when writing code using the Spark APIs:

When working with large datasets, you will have bad input that is malformed or not as you would expect it. I recommend being proactive about deciding for your use case, whether you can drop any bad input, or you want to try fixing and recovering, or otherwise investigating why your input data is bad.

A filter command is a great way to get only your good input points or your bad input data (If you want to look into that more and debug). If you want to fix your input data or to drop it if you cannot, then using a flatMap() operation is a great way to accomplish that.

This is a good set of tips.

Comments closed

Cloudera, Polybase, And Active Directory

Ajay Jagannathan shows how to integrate a SQL Server instance + Polybase with a Cloudera Hadoop cluster, all using Active Directory for accounts:

For all usernames and principals, we will use the suffixes like Cluster14 for name-scalability.

  1. Active Directory setup:
  1. Install OpenLDAP utilities (openldap-clients on RHEL/Centos) on the host of Cloudera Manager server. Install Kerberos client (krb5-workstation on RHEL/Centos) on all hosts of the cluster. This step requires internet connection in Hadoop server. If there is no internet connection in the server, you can download the rpm and install.

This is absolutely worth the read.

Comments closed

Automated Emails

Allison Tharp shows how to send automated e-mails with Powershell:

The update has two parts: how I feel about my work and how I feel about my department.  For each of these, I wrote a few ‘beginning’ sentences and a few ‘ending’ sentences.  The script picks a random beginning and ending sentence for each category (work and department), color codes it, and sends the email to my personal and my work emails.

I love the randomization.

Comments closed

Deployment Contributors

Richie Lee discusses an alternative to pre-model scripts:

According to the blurb, deployment contributors can perform custom actions when deploying a SQL script. And one such use of deployment contributors would be to alter index builds to be an online operation. Microsoft also have a Github DACExtensions repo, and this is very useful because, and in the interests of full disclosure, I have never written a deployment contributor myself. This is partly because the repo has some very good examples, including the online index issue (this post nicely covers how to make use of deployment contributors.) I know those that have and have explained how they work very well. But I think there are a few challenges w/r/t deployment contributors:

  • No one has ever heard of them

  • You have to use C#

  • They’re not entirely straightforward.

This is a good discussion of deployment contributors, including why we don’t see them more frequently.

Comments closed

Subqueries And Performance

Grant Fritchey busts a myth:

I’ve written before about the concept of cargo cult data professionals. They see one issue, one time, and consequently extrapolate that to all issues, all the time. It’s the best explanation I have for why someone would suggest that a sub-query is flat out wrong and will hurt performance.

Let me put a caveat up front (which I will reiterate in the conclusion, just so we’re clear), there’s nothing magically good about sub-queries just like there is nothing magically evil about sub-queries. You can absolutely write a sub-query that performs horribly, does horrible things, runs badly, and therefore absolutely screws up your system. Just as you can with any kind of query. I am addressing the bad advice that a sub-query is to be avoided because they will inherently lead to poor performance.

There are times not to use subqueries, but this post is absolutely correct:  understand the reasons why things may or may not perform well, and don’t be afraid to try things out.

Comments closed