Press "Enter" to skip to content

Month: October 2016

Aborting Index Rebuilds

Arun Sirpal shows how to use the ABORT_AFTER_WAIT attribute on an index rebuild command:

Looking into the locking you will see that ONLINE operation uses (Sch-M) on the corresponding table as part of the process (actually takes Shared Table Lock (S) at the beginning of the operation, and a Schema Modification Lock (Sch-M at the end)).

So to be granted a SCH-M lock you can’t have any conflicting locks, so what happens when / if you have a process that is updating the table and you want to use the ONLINE rebuild? Yes you will be blocked. With 2014 onwards we can control what happens if we get into this situation and for this post I am going to abort the other query causing me to wait.

Not sure I like the “Kick the other guy(s) off” part that much, but I can see uses.  It’s probably more likely to go the opposite route, cancelling the rebuild if the server’s too hot.

Comments closed

DBCC SHOW_STATISTICS Update

Erik Darling notes that his Connect item to replace DBCC SHOW_STATISTICS has been marked as resolved:

So what does it look like?

I have no idea. I don’t know if it’s a DMV or a function, I don’t know what it’s called, and I don’t know what information it exposes. I also don’t know how it will get joined to other DMVs. There were no details offered up when the status changed. And I’m fine with that! I’m pretty psyched that it got enough traction to get a fix to begin with. If anyone from MS feels like shooting me an email with details, I won’t complain.

But since we don’t know, we’re free to speculate. Like all those History Channel shows about aliens and fake animals and where the Templars secretly buried Jesus’ gold teeth in Arizona. It’ll be fun!

It’ll be interesting to see the results.

Comments closed

JSON Parsing In U-SQL

Ginger Grant pulls out everybody’s favorite .NET JSON parser:

In USQL there are built-in extractors for parsing text, comma delimited or tab delimined files. Once again, parsing JSON becomes problematic. There is a solution built into USQL, write some C# code to extend it or use someone else’s C# code to extend USQL. Since I wanted to parse JSON, fortunately there are libraries available on github containing the information required to do it. Download the github package and open up the Microsoft.Analytics.Samples project in Visual Studio. When I did this the first time, there was a problem loading the Newtonsoft.Json reference, so I right clicked on the references and downloaded the missing parts again. Build the solution and check out the code in the directory …Examples\DataFormats\Microsoft.Analytics.Samples.Formats\bin\Debug\ . There will be two DLLs, Microsoft.Analytics.Samples.Formats.dll and Newtonsoft.Json.dll. These dlls then need to be registered in Data Lake Analytics and locally if you chose to run your USQL locally. As at some point the goal is to run from within Data Lake analytics, you will need to copy both of these dlls to the data lake. I created a folder for the dlls called Assemblies, and ran this command

It’s funny how often that library comes up…  Click through to see how to use it with U-SQL jobs.

Comments closed

Using Statistics For Index Design

Kendra Little argues that you should not use automatically created statistics as a guide for index creation:

We’ve talked a lot so far about how much statistics and indexes are related. This is why it seems like statistics might be useful for designing indexes!

But here’s the thing — SQL Server doesn’t track and report on how many times a statistic was used during optimization.

This is an interesting discussion.

Comments closed

Hurricane Matthew Tracking With Power BI

Chris Albrektson has a Power BI report tracking Hurricane Matthew:

We’ve got company and it’s not the type of company that you want! As most Floridians are preparing for Hurricane Matthew I thought it might be neat to track the storm using PowerBi. So I went out and found some public data online and brought that into PowerBi, created a couple calculations and some visualizations.

My goal for this was to create a report where I could track the storm no matter where I was. I also needed the ability to see the latest data without any manual intervention. PowerBi can handle all of this for me utilizing the PowerBi Mobile app and a few other cool features.

Good use of Power BI here.

Comments closed

Use Folders With Polybase

Andrew Peterson argues that you should use folders instead of individual files when creating external tables:

Why?
1) Add more files to the directory, and Polybase External table will automagically read them.
2) Do INSERTS and UPDATES from PolyBase back to your files in Hadoop.
( See PolyBase – Insert data into a Hadoop Hue Directory ,
PolyBase – Insert data into new Hadoop Directory    ).
3) It’s cleaner.

This is good advice.  Also, if you’re using some other process to load data—for example, a map-reduce job or Spark job—you might have many smaller file chunks based on what the reducers spit out.  It’s not a bad idea to cat those file chunks together, but at least if you use a folder for your external data location, your downstream processes will still work as expected regardless of how the data is laid out.

Comments closed

NetworkD3

Vessy combines Javascript and R to visualize networks:

The networkD3 package provides a function called igraph_to_networkD3, that uses an igraph object to convert it into a format that networkD3 uses to create a network representation. As I used igraph object to store my network, including node and edge properties, I was hoping that I may only need to use this function to create a visualization of my network. However, this function does not work exactly like that (which is not that surprising, given the differences in how D3.js works and how igraph object is defined). Instead, it extracts lists of nodes and edges from the igraph object, but not the information about all node and edges properties (the exception is a priori specified information about nodes membership groups/clusters, which can be derived from one or more network properties, e.g., node degree). Additionally, the igraph_to_networkD3 function does not plot the network itself, but only extracts parameters that are later used in theforceNetwork function that plots the network.

This is the kind of thing I want to see when working with network data.  It doesn’t necessarily scale, but given how well the human eye tracks relationships, this is very useful.

Comments closed

Understanding Page Splits

Wayne Sheffield goes into detail on page splits:

In considering which of these methods is preferred, we need to consider whether page splits impact these methods – especially nasty page splits. Furthermore, how will index maintenance affect each choice? So let’s think this through.

When there are negative values in this column, and the index is rebuilt, there will be a page with both negative and positive values in it. If the identity column is set to (-1, -1), there won’t be a gap (excluding the 0) in the values, and newly added rows will get a new page allocated – a good page split. If the identity column is set to (-2147483648 , 1), then there will be a full page with the records for the most recently used identity value, and with the values starting with 1 – a rather large gap.

This is worth reading in its entirety.

Comments closed

Log Buffers

Mark Broadbent has started a series on transaction durability.  His first topic is the log buffer:

SQL Server is a highly efficient transaction processing platform and nearly every single operation performed by it, is usually first performed within memory. When operations are performed within memory, the need to touch physical resources (such as physical disk IOPS) are also reduced, and reducing the need to touch physical resources means those physical boundaries (and their limitations) have less impact to the overall system performance. Cool right?!

Click through to read more about how log buffers work and why they help improve SQL Server’s performance.

Comments closed

Navigating Complex Tabular Models

Bill Anton has a method for understanding large tabular models on legacy platforms:

Unfortunately, the Tabular Model Explorer is only available for 2016 (compatibility 1200) tabular models – which many folks haven’t moved over to just yet (despite the overwhelming list of reasons why SQL 2016 is one of the best releases in a very long time).

Those of us stuck with 2012/2014 environments have no other option than to comb through the diagram view for that one table we’re looking for…or scan the unordered list of tables/columns in grid view, or arrow-key through a bunch of cells in the calculation pain pane to find a particular measure… or so I thought up until a few weeks ago when I discovered a better way!

At least 60-70% of the DBA population would chafe at the idea that is a “legacy” platform, so maybe I’m overstepping it a little with calling 2014 legacy.  But seriously, 2016 is a huge improvement, well worth the jump.

Comments closed