Query Store And Better Plans

Grant Fritchey asks, what happens when you force a Query Store plan and later a better plan becomes available?

Let’s answer a different question first. What happens when the plan gets invalidated, when the index being used gets dropped or some other structural change occurs so that the plan is no longer valid? I answered that question in this blog post. The plan being forced, after the object is dropped, becomes invalid, so that plan can no longer be used. The Query Store still attempts to apply the plan during any recompile or compile event of the query in question, but it fails and a proper plan is used. All this means, I think, the Query Store is going to ignore the new index, since a new index doesn’t invalidate an existing plan. A new index just makes new plans possible. However, when I was asked this question, this wasn’t something I had tested, so I gave a speculative, best guess, answer with plenty of caveats and the promise to provide a tested answer ASAP. Here we go.

Read on for the full answer.

Pulsar

Kevin Feasel

2016-09-08

Hadoop

Alex Woodie reports that Yahoo is open sourcing Pulsar:

Pulsar uses Apache Bookkeeper (committed by Yahoo to open source in 2011) as its durable storage mechanism. “With Bookkeeper, applications can create many independent logs, called ledgers,” Pulsar’s project page on GitHub says. “A ledger is an append-only data structure with a single writer that is assigned to multiple storage nodes (or bookies) and whose entries are replicated to multiple of these nodes.”

Pulsar uses brokers to serve topics. Each topic is assigned to a broker, and an individual broker can serve thousands of topics, Yahoo says. “The broker accepts messages from writers, commits them to a durable store, and dispatches them to readers,” Yahoo says.

My biases would lead me to still go with Kafka over Pulsar, but it’d be interesting to see a good comp between the two.

Power BI And Impala

Justin Kestelyn describes the Impala Connector for Power BI Desktop:

Note that the connector currently only supports Import mode, which requires downloading the query output data to the local data model. In future updates, we will enhance the connector with DirectQuery capabilities, as well as with support for refresh scenarios via the Power BI Gateway. [Ed. Note: As of the August 2016 update, the Impala Connector also supports DirectQuery mode, which means you are always viewing the most up-to-date data. The functionality for both periodic refreshes and DirectQuery mode require the Power BI Gateway running either on-premise or in Microsoft Azure.]

Enabling Power BI connectivity to Impala has been a very frequently requested capability from our customers. We encourage you to give it a try and share with us any feedback or issues that you encounter via the “Send a Frown” feature in Power BI Desktop.

Good stuff.

Nested Loop Estimation

Kendra Little describes the nested loop operator in execution plans:

One of the first things we often learn when we’re looking at plans is that SQL Server uses estimates. And sometimes, those estimates are wrong. At first glance, this looks really wrong– it estimated 11.5 rows, and actually got 20,825 rows!

Read the whole thing to see how nested loop estimates which seem way off might not actually be.

Instance Segregation

Kennie Pontoppidan discusses why you don’t want your transactional system on the same instance as your warehouse:

Mixing OLTP and data warehouse workload types in the same SQL Server instance is not a good idea. Why? Well, because the workload types don’t mix. As described above, an ideal buffer cache for an OLTP system contains pages used in searches as well as pages for the most used customer/items etc. Running a few data warehouse queries will change the buffer cache to contain (mostly) all dimension data. This will make OLTP operations slow. On the other hand, the ideal buffer cache in a data warehouse will contain all dimension data, which makes aggregations fast because all dimension lookups will be done in memory. OLTP operations might flush out parts of this dimension data from the cache, hurting the performance for data warehouse queries. The two workload types will keep on fighting about the content of the buffer cache, making both systems suffer.

Except in very small systems (where it doesn’t much matter), this is sound advice.

Looking For SQL Saturday Sessions

Wayne Sheffield has a Powershell cmdlet to find sessions matching SQL Saturday topics:

The output to the screen is the Event #, Speaker, Session Title and URL for the presentation.

The generated export file will also include the event name and session abstract. Additionally, the URL will be encased with the Excel HYPERLINK() function. When the export file is opened up with Excel, clicking the URL will open your browser to the session information, where the presentation material can be downloaded.

I hope that you get a lot of use out of this script.

Click through for the script and more details.

Ambari And Active Directory

Jon Morisi documents his efforts in getting Ambari to play nicely with Active Directory over Kerberos:

You then need to trust the certificate on all the linux hosts
From the IBM article:

  1. Create ‘/etc/pki/ca-trust/source/anchors/activedirectory.pem’ and paste the certificate contents

  2. Trust CA cert: sudo update-ca-trust enable; sudo update-ca-trust extract; sudo update-ca-trust check

  3. Trust CA cert in Java:

  4. mycert=/etc/pki/ca-trust/source/anchors/activedirectory.pem sudo keytool -importcert -noprompt -storepass changeit -file ${mycert} -alias ad -keystore /etc/pki/java/cacerts

  5. And at last, please make sure every node on your cluster has access to the ad host.

LDAP support is a key part of setting up a production Hadoop cluster.

Chaos Sloth

Erik Darling has created a script to make your servers go slow:

It randomly generates values and changes some important configuration settings.

  • Max Degree of Parallelism
  • Cost Threshold
  • Max Memory
  • Database compatibility level

This was written for SQL Server 2016, on a box that had 384 GB of RAM. If your specs don’t line up, you may have to change the seed values here. I’m not putting any more development into this thing to automatically detect SQL version or memory in the server, because this was a one-off joke script to see how bad things could get.

How bad did they get? The server crashed multiple times.

Not for production purposes.  Or maybe any purposes…

Try-Catch Blocks

Dave Mason discusses a major problem with try-catch blocks:

We didn’t just lose one error. We lost them all! DBCC CHECKDB on a corrupt database just came back “clean” and the CATCH block never executed. We’ve moved into dangerous territory now. This is no longer an inconvenience or an annoyance. This could lead to data loss, financial harm to your company, maybe even some risk to your career.

Dave promises a better approach in his next post.

Table Heatmaps

Devin Knight continues his Power BI custom visuals series:

  • In the Table Heatmap the color of the boxes is determined by the value in your measure.

  • Only 1 category field can be used, which will dynamically generate the number of columns based on the number of distinct values your field has.

  • The number and types of colors can be changed using some of the settings we’ll discuss below.

I can see the table heatmap being a good visual for calendars.

Categories

September 2016
MTWTFSS
« Aug Oct »
 1234
567891011
12131415161718
19202122232425
2627282930