2016-09-08 – Curated SQL

Let’s answer a different question first. What happens when the plan gets invalidated, when the index being used gets dropped or some other structural change occurs so that the plan is no longer valid? I answered that question in this blog post. The plan being forced, after the object is dropped, becomes invalid, so that plan can no longer be used. The Query Store still attempts to apply the plan during any recompile or compile event of the query in question, but it fails and a proper plan is used. All this means, I think, the Query Store is going to ignore the new index, since a new index doesn’t invalidate an existing plan. A new index just makes new plans possible. However, when I was asked this question, this wasn’t something I had tested, so I gave a speculative, best guess, answer with plenty of caveats and the promise to provide a tested answer ASAP. Here we go.

Read on for the full answer.

Comments closed

Pulsar

Published 2016-09-08 by Kevin Feasel

Alex Woodie reports that Yahoo is open sourcing Pulsar:

Pulsar uses Apache Bookkeeper (committed by Yahoo to open source in 2011) as its durable storage mechanism. “With Bookkeeper, applications can create many independent logs, called ledgers,” Pulsar’s project page on GitHub says. “A ledger is an append-only data structure with a single writer that is assigned to multiple storage nodes (or bookies) and whose entries are replicated to multiple of these nodes.”

Pulsar uses brokers to serve topics. Each topic is assigned to a broker, and an individual broker can serve thousands of topics, Yahoo says. “The broker accepts messages from writers, commits them to a durable store, and dispatches them to readers,” Yahoo says.

My biases would lead me to still go with Kafka over Pulsar, but it’d be interesting to see a good comp between the two.

Comments closed

Power BI And Impala

Published 2016-09-08 by Kevin Feasel

Justin Kestelyn describes the Impala Connector for Power BI Desktop:

Note that the connector currently only supports Import mode, which requires downloading the query output data to the local data model. In future updates, we will enhance the connector with DirectQuery capabilities, as well as with support for refresh scenarios via the Power BI Gateway. [Ed. Note: As of the August 2016 update, the Impala Connector also supports DirectQuery mode, which means you are always viewing the most up-to-date data. The functionality for both periodic refreshes and DirectQuery mode require the Power BI Gateway running either on-premise or in Microsoft Azure.]

Enabling Power BI connectivity to Impala has been a very frequently requested capability from our customers. We encourage you to give it a try and share with us any feedback or issues that you encounter via the “Send a Frown” feature in Power BI Desktop.

Good stuff.

Comments closed

Nested Loop Estimation

Published 2016-09-08 by Kevin Feasel

Kendra Little describes the nested loop operator in execution plans:

One of the first things we often learn when we’re looking at plans is that SQL Server uses estimates. And sometimes, those estimates are wrong. At first glance, this looks really wrong– it estimated 11.5 rows, and actually got 20,825 rows!

Read the whole thing to see how nested loop estimates which seem way off might not actually be.

Comments closed

Instance Segregation

Published 2016-09-08 by Kevin Feasel

Kennie Pontoppidan discusses why you don’t want your transactional system on the same instance as your warehouse:

Mixing OLTP and data warehouse workload types in the same SQL Server instance is not a good idea. Why? Well, because the workload types don’t mix. As described above, an ideal buffer cache for an OLTP system contains pages used in searches as well as pages for the most used customer/items etc. Running a few data warehouse queries will change the buffer cache to contain (mostly) all dimension data. This will make OLTP operations slow. On the other hand, the ideal buffer cache in a data warehouse will contain all dimension data, which makes aggregations fast because all dimension lookups will be done in memory. OLTP operations might flush out parts of this dimension data from the cache, hurting the performance for data warehouse queries. The two workload types will keep on fighting about the content of the buffer cache, making both systems suffer.

Except in very small systems (where it doesn’t much matter), this is sound advice.

Comments closed

Looking For SQL Saturday Sessions

Published 2016-09-08 by Kevin Feasel

Wayne Sheffield has a Powershell cmdlet to find sessions matching SQL Saturday topics:

The output to the screen is the Event #, Speaker, Session Title and URL for the presentation.

The generated export file will also include the event name and session abstract. Additionally, the URL will be encased with the Excel HYPERLINK() function. When the export file is opened up with Excel, clicking the URL will open your browser to the session information, where the presentation material can be downloaded.

I hope that you get a lot of use out of this script.

Click through for the script and more details.

Comments closed

Ambari And Active Directory

Published 2016-09-08 by Kevin Feasel

Jon Morisi documents his efforts in getting Ambari to play nicely with Active Directory over Kerberos:

You then need to trust the certificate on all the linux hosts
From the IBM article:

Create ‘/etc/pki/ca-trust/source/anchors/activedirectory.pem’ and paste the certificate contents
Trust CA cert: sudo update-ca-trust enable; sudo update-ca-trust extract; sudo update-ca-trust check
Trust CA cert in Java:
mycert=/etc/pki/ca-trust/source/anchors/activedirectory.pem sudo keytool -importcert -noprompt -storepass changeit -file ${mycert} -alias ad -keystore /etc/pki/java/cacerts
And at last, please make sure every node on your cluster has access to the ad host.

LDAP support is a key part of setting up a production Hadoop cluster.

Comments closed

Chaos Sloth

Published 2016-09-08 by Kevin Feasel

Erik Darling has created a script to make your servers go slow:

It randomly generates values and changes some important configuration settings.

Max Degree of Parallelism

Cost Threshold

Max Memory

Database compatibility level

This was written for SQL Server 2016, on a box that had 384 GB of RAM. If your specs don’t line up, you may have to change the seed values here. I’m not putting any more development into this thing to automatically detect SQL version or memory in the server, because this was a one-off joke script to see how bad things could get.

How bad did they get? The server crashed multiple times.

Not for production purposes. Or maybe any purposes…

Comments closed

Try-Catch Blocks

Published 2016-09-08 by Kevin Feasel

Dave Mason discusses a major problem with try-catch blocks:

We didn’t just lose one error. We lost them all! DBCC CHECKDB on a corrupt database just came back “clean” and the CATCH block never executed. We’ve moved into dangerous territory now. This is no longer an inconvenience or an annoyance. This could lead to data loss, financial harm to your company, maybe even some risk to your career.

Dave promises a better approach in his next post.

Comments closed

Table Heatmaps

Published 2016-09-08 by Kevin Feasel

Devin Knight continues his Power BI custom visuals series:

In the Table Heatmap the color of the boxes is determined by the value in your measure.
Only 1 category field can be used, which will dynamically generate the number of columns based on the number of distinct values your field has.
The number and types of colors can be changed using some of the settings we’ll discuss below.

I can see the table heatmap being a good visual for calendars.

Comments closed

M	T	W	T	F	S	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30

Day: September 8, 2016

Query Store And Better Plans

Pulsar

Power BI And Impala

Nested Loop Estimation

Instance Segregation

Looking For SQL Saturday Sessions

Ambari And Active Directory

Chaos Sloth

Try-Catch Blocks

Table Heatmaps