Press "Enter" to skip to content

Author: Kevin Feasel

Forcing Polybase External Pushdown

I have a post showing how to control predicate pushdown in Polybase:

As a reminder, in order to allow predicate pushdown to occur, we need to hit a Hadoop cluster; we can’t use predicate pushdown on other systems like Azure Blob Storage.  Second, we need to have a resource manager link set up in our external data source.  Third, we need to make sure that everything is configured correctly on the Polybase side.  But once you have those items in place, it’s possible to use the FORCE EXTERNALPUSHDOWN command like so:

There’s also discussion of preventing MapReduce job creation as well as a pushdown-related error I had received in the past.

Comments closed

Memory-Optimized Transaction Logging

Raul Gonzalez looks at how transaction logging works with memory-optimized tables:

Now we need double the rows, because for each row we’ve said it’s been deleted, we have to tell SQL Server that was not actually deleted (COMPENSATION due to ROLLBACK) in case of recovery (crash recovery or backup recovery). That’s so bad.

But not everything is lost yet :) let’s check how the In-Memory engine deal with this problem

Memory-optimized tables are pretty neat.

Comments closed

Forcing The Legacy Cardinality Estimator

SQL Scotsman has three methods for using the legacy Cardinality Estimator in specific circumstances even after you’ve switched over to the new:

The problem with lowering the database compatibility level is that you can’t leverage the new engine functionality available under the latest compatibility level.

This problem was solved in SQL Server 2016 with the introduction of Database Scoped Configurations which gives you the ability to make several database-level configuration changes for properties that were previously configured at the instance-level.  In particular, the LEGACY_CARDINALITY_ESTIMATION database scoped configuration allows you to set the cardinality estimation model independent of the database compatibility level.This option allows you to leverage all new functionality provided with compatibility level 130 but still use the legacy CE in the odd chance that the latest CE casuses severe query regressions across your workload.

Each has its own specific use cases, so it’s good to know all three.

Comments closed

Capacity Planning

Erin Stellato walks through capacity planning:

Capacity planning for a new solution is really tricky.  You have to come up with estimates about workload based on information you collect from the business.  This means you have to ask hard questions about how much data they will expect in the first month, the first six months, and the first year.  When a new solution is coming in, this is often the LAST thing the business is thinking about, so very often you’re going to get vague answers.  In the case of a new solution, you really have make a best guess effort.  Don’t pull your hair out trying to get exact numbers.

If the solution is from a vendor, you must ask the vendor for planning recommendations about both space needed and the resources needed.  I admit, they may not have that data, but you don’t get what you don’t ask for.  It never hurts to try.

Click through for Erin’s discussion points across hardware and storage requirements over time.

Comments closed

Using Event Notifications

Dave Mason lays out how to set up Event Notifications:

When an Event Notification is created, one or more conversations is created between the SQL Server database engine and a Service. I tend to think of these as “message channels”. When the related event occurs, SQL Server calls the EVENTDATA() function, returning the event information results (as a variable of type XML) to a Service. The Service in turn writes the information to a Queue. A Queue is a FIFO data structure. Conceptually it is similar to a table of rows and columns. You can SELECT from it, but you can’t directly insert or update its rows. You “delete” rows from a Queue via the RECEIVE statement.

Dave has a full example worked out at the link.

Comments closed

Comparing Active Directory Membership

Jana Sattainathan shows how to use Powershell to compare Active Directory group membership for two users:

Before we begin, if you are running Windows 7 or if you do not have the Active Directory module installed, please do so first by downloading and installing “Remote Server Administration Tools”. I personally cannot live without this module.

Let us call the new employee as NEW_USER and the existing employee as OLD_USER. Here is how to use PowerShell to do the comparison.

The retrieval and comparison code is pretty easy; click through to see.

Comments closed

VSphere 6.5 And VNUMA

David Klee notes that vSphere 6.5 might modify vNUMA settings on you:

I appreciate their attempt to improve performance, but this presents a challenge for performance-oriented DBAs in many ways. It now changes expected behavior from the basic configuration without prompting or notifying the administrators in any way that this is happening. It also means that if I have a host cluster with a mixed server CPU topology, I could now have NUMA misalignments if a VM vMotions to another physical server that contains a different CPU configuration, which is sure to cause a performance problem.

Worse yet is that if I were to restart this VM on this new host, the hypervisor could automatically change the vNUMA configuration at boot time based on the new host hardware.

I now have a change in vNUMA inside SQL Server. My MaxDOP settings could now be wrong. I now have a change in expected query behavior. 

Definitely read the whole thing if you’re using vmWare.

Comments closed

Running MapReduce Polybase Queries

I have a post which digs into what happens when a Polybase query invokes a MapReduce job:

Stream 2 sends along 27 MB worth of data.  It’s packaging everything Polybase needs to perform operations, including JAR files and any internal conversion work that the Polybase engine might need to translate Hadoop results into SQL Server results.  For example, the sqlsort at the bottom is a DLL that performs ordering using SQL Server-style collations, as per the Polybase academic paper.

Stream 2 is responsible for almost every packet from 43 through 19811; only 479 packets in that range belonged to some other stream (19811 – 43 – 18864 – 425 = 479).  We send all of this data to the data node via port 50010.

If you love looking at Wireshark streams, you’ll love this post.

Comments closed

Cardinality Estimator Regressions

SQL Scotsman has a great post on figuring out which of your queries have become worse as a result of the SQL Server cardinality estimator changes in 2014:

Instantly it is apparent that the most resource intensive query was the same query across both workload tests and note that the query hash is consistent too.  It is also apparent that this query performs worse under the new cardinality estimator model version 120.  To investigate and understand why this particular query behaves differently under the different cardinality estimators we’ll need to look at the actual query and the execution plans.

Looking at the information in #TempCEStats and the execution plans, the problematic query below belongs to the SLEV stored procedure.

There’s also a discussion of Query Store in there, but it’s important to understand how to figure this out even if you’re on 2014 and don’t have access to Query Store.

Comments closed