Press "Enter" to skip to content

Author: Kevin Feasel

Cardinality Estimation And String Splits

Dan Holmes points out a quirk of estimated row counts with CLR-based functions:

That is an enormous amount of data.  What if you needed to sort that?  What if you joined this to another table or view and a spool was required.  What it it was a hash join and a memory grant was required?  The demand that this seemingly innocuous statement placed on your server could be overwhelming.

The memory grant could create system variability that is very difficult to find.  There is a thread on MSDN that I started which exposes what prompted this post.  (The plan that was causing much of the problem is at this link.)

It’s important to keep in mind the good enough “big round figures” that SQL Server uses for row estimation when stats are unavailable (e.g., linked server to Hive or a CLR function like in the post).  These estimates aren’t always correct, and there are edge cases like the one in the post in which the estimates will be radically wrong and begin to affect your server.

Comments closed

Cardinality Estimation And Hints

Aaron Bertrand uses a Visual Studio Online outage to talk about query hints:

These are all things that may have been necessary under the old estimator, but are likely just tying the optimizer’s hands under the new one. This is a query that could have, and should have, been tested in their dev / staging / QA environments under the new cardinality estimator long before they flipped the switch in production, and probably could have gone through series of tests where different combinations of those hints and options could have been removed. This is something for which that team can only blame themselves.

Also check out Aaron Morelli’s comment on the post.

Comments closed

Investigating Cleveland

Dave Mattingly goes spatial on Cleveland:

From here, we can:

  • zoom in for more detail

  • hover over a building, road, or other feature to see its name or other column

  • display a label on the results

  • apply filters to only show parts of the data

  • change the widths of the features by changing the STBuffer

  • do lots of other cool stuff

Spatial types and display in SQL Server has always been a weak point for me, so I enjoy seeing the fruits of somebody who is very good at it.

Comments closed

Drive Failure

SQL Wayne cautions you to think about drive failure:

Now is when the MTBF comes in. If all of the drives were from the same batch, then they have approximately the same MTBF. One drive failed. Thus, all of the drives are not far from failure. And what happens when the failed drive is replaced? The RAID controller rebuilds it. How does it rebuild the new drive? It reads the existing drives to recalculate the checksums and rebuild the data on the new drive. So you now have a VERY I/O intensive operation going on with heavy read activity on a bunch of drives that are probably pushing end of life.

This is where it’s important to keep spares and cycle out hardware.

Comments closed

Credit Card Fraud Detection Using R

David Smith gives us a tutorial on credit card fraud detection:

If you have a database of credit-card transactions with a small percentage tagged as fraudulent, how can you create a process that automatically flags likely fraudulent transactions in the future? That’s the premise behind the latest Data Science Deep Dive on MSDN. This tutorial provides a step by step to using the R language and the big-data statistical models of the RevoScaleR package of SQL Server 2016 R Services to build and use a predictive model to detect fraud.

This looks to be a follow-up from the fraud detection series.

Comments closed

Plan Cache Spelunking

Ed Pollack digs into the plan cache:

The data in the plan cache is not static, and will change over time. Execution plans, along with their associated query and resource metrics will remain in memory for as long as they are deemed relevant. Plans can be removed from cache when there is memory pressure, when they age out (they get stale), or when a new plan is created, rendering the old one obsolete. Keep this in mind as we are looking around: The info we find in the plan cache is transient and indicative of a server’s current and recent activity and does not reflect a long-term history. As a result, be sure to do thorough research on the plan cache prior to making any significant decisions based on that data.

The plan cache is one of the best ways of figuring out what’s going on in your SQL Server instances, but there’s a little bit of complexity to it.

Comments closed

Sorting Power Pivot Data On Load

Matt Allington suggests pre-sorting results to reduce load in Power Pivot:

Imagine you have 50,000 products in your data table and you have 50,000,000 rows of data.  Power Pivot will take the first 1 million rows it comes to (1 segment worth), work out how to sort and compress the columns, and then compress the data into a single segment before moving to the next 1 million rows it comes to (in the order they are loaded).  When it does this, it is highly likely that every product number will appear in every single segment – all 50 segments.  If we assume an equal number of product records for each product (unlikely but OK for this discussion), then there would be 1,000 records for each product spread throughout the entire data table,and each and every segment is likely to contain all 50,000 product IDs.  This is not good for compression.

This is an interesting result and not something I would have thought intuitive.

Comments closed

Understanding Logins Versus Users

Kenneth Fisher has a user which should have rights but is unable to access the database in question:

The other day I ran across an interesting problem. A user was logging in but didn’t have access to a database they were certain they used to access to. We checked and there they were. Not only was there a database principal (a user) but it was a member of db_owner. But still no go. The user could not connect. I went to the database and impersonated them and then checked sys.fn_my_permissions. They were definitely a member of db_owner. I tested and yes, I could read the tables they needed, and yes they could execute the stored procedures they needed to execute. So what was wrong?

Keep those principals in alignment.

Comments closed

Dynamic Data Masking In Detail

Louis Davidson has a two-part series on dynamic data masking in SQL Server 2016.

Part 1:

An interesting feature that is being added to SQL Server 2016 is Dynamic Data Masking. What it does is, allow you to show a user a column, but instead of showing them the actual data, it masks it from their view. Like if you have a table that has email addresses, you might want to mask the data so most users can’t see the actual data when they are querying the data. It falls under the head of security features in Books Online (https://msdn.microsoft.com/en-us/library/mt130841.aspx), but as we will see, it doesn’t behave like classic security features, as you will be adding some code to the DDL of the table, and (as of this writing in CTP3.2, the ability to fine tune who can and cannot see unmasked data isn’t really there.)

Part 2:

The moral here is that need to be careful about how you use this feature. It is not as strict as column level security (or as Row Level Security will turn out to be, which is the next series of blogs to follow), so if a user has ad-hoc access to your db, they could figure out the data with some simple queries.

Louis’s second part is particularly interesting, as he delves into the various ways in which you can back into answers (some of which, like casting values to other types, have been fixed).

Comments closed