Press "Enter" to skip to content

Author: Kevin Feasel

Parallelism Strategies For Grouping Operations

Itzik Ben-Gan continues his series on grouping data in SQL Server by looking at how these operations can go parallel:

Besides needing to choose between various grouping and aggregation strategies (preordered Stream Aggregate, Sort + Stream Aggregate, Hash Aggregate), SQL Server also needs to choose whether to go with a serial or a parallel plan. In fact, it can choose between multiple different parallelism strategies. SQL Server uses costing logic that results in optimization thresholds that under different conditions make one strategy preferred to the others. We’ve already discussed in depth the costing logic that SQL Server uses in serial plans in the previous parts of the series. In this section I’ll introduce a number of parallelism strategies that SQL Server can use for handling grouping and aggregation. Initially, I won’t get into the details of the costing logic, rather just describe the available options. Later in the article I’ll explain how the costing formulas work, and an important factor in those formulas called DOP for costing.

As you will later learn, SQL Server takes into account the number of logical CPUs in the machine in its costing formulas for parallel plans. In my examples, unless I say otherwise, I assume the target system has 8 logical CPUs. If you want to try out the examples that I’ll provide, in order to get the same plans and costing values like I do, you need to run the code on a machine with 8 logical CPUs as well. If you’re machine happens to have a different number of CPUs, you can emulate a machine with 8 CPUs—for costing purposes—like so:

DBCC OPTIMIZER_WHATIF(CPUs, 8);

Even though this tool is not officially documented and supported, it’s quite convenient for research and learning purposes.

This is a fairly long article, but a great one.

Comments closed

Installing Confluent Platform On Windows

Niels Berglund shows how to install Confluent Platform (the Confluent branded version of Apache Kafka) on a Windows machine using the Windows Subsystem for Linux:

WSL is primarily aimed at developers, and it allows you to run Linux environments directly on Windows in a native format and without the overhead of a virtual machine. Let us retake a look at that statement: run Linux environments directly on Windows in a native format. Yes native format, WSL is not a UNIX-like environment like Cygwin, which wraps non-Windows functionality in Win32 system calls but it serves Linux programs as special, isolated minimal processes (pico-processes) attached to kernel-mode pico-providers. If you want to read all the “gory” details about WSLWindows Subsystem for Linux Overview gives you an excellent introduction.

Surprisingly, it’s pretty easy—I would have expected some strange compatibility issues.

Comments closed

Building Cone Plots In Plotly

The Plotly blog shows how to use Python to build 3D cone plots using Plotly:

This plot uses an explicitly defined vector field. A vector field refers to an assignment of a vector to each point in a subset of space.

In this plot, we visualize a collection of arrows that simply model the wind speed and direction at various levels of the atmosphere.

3-D weather plots can be useful to research scientists to gain a better understanding of the atmospheric profile, such as during the prediction of severe weather events like tornadoes and hurricanes.

Sometimes a 3D plot is the best answer.  When it is, this looks like a good solution.  H/T R-bloggers

Comments closed

Using APPLY To Aggregate Unpivoted Data

Dan Clemens gives us yet another practical use of the APPLY operator:

I had a situation last week where I needed to find the MIN() and MAX() values of some data. Normally this would be pretty straightforward, but the problem was I needed to identify the MIN() and MAX() values amongst multiple columns. So today I bring you a SQL tip using APPLY.

Looking at sys.procedures we can see I have a create_date and a modify_date column for each record. What if we needed to return the MIN() or MAX() value from those 2 (or more) columns?

SELECT [Name]
,create_date
,modify_date
FROM sys.procedures AS p
WHERE p.[name] = 'ChangePassword';

In this two-column example, it’s not too difficult.  As you add more and more columns, the solution remains the same, though the urge to ask why all of these dates are unpivoted might increase…

Comments closed

Sorting When Your Measure Is Not In The Visual

Kasper de Jonge shows us different ways of sorting a visual by some unrelated measure:

So lets start with the simple one, I want to sort a chart on a measure not part of the visual. Let’s take this visual:

Now instead of sorting by OrderQuantity I want to sort by the ListPrice. The trick here is to make the measure part of the query, and one way you can do that is by adding it to the tooltip

Read on for examples for charts as well as matrices.

Comments closed

Managing Central Management Server

Chrissy LeMaire shows how you can use dbatools to manage Central Management Server and registered servers:

It’s a super useful feature that not all DBAs know about. Since CMS data is stored in msdb and accessible via SMO, you can access it from SQL Server Management Studio or PowerShell modules like dbatools.

Central Management Server’s essential functionality includes:

Actions that are taken by using a central management server group act on all servers in the server group. This includes connecting to servers by using Object Explorer and executing Transact-SQL statements and Policy-Based Management policies on multiple servers at the same time.

I mostly use it as a visual repository of my SQL Servers. Prior to using dbatools and Invoke-DbaSqlQuery, however, I did use CMS to easily execute code against a number of different servers.

CMS is a great feature, and is a critical tool for scaling out a SQL Server infrastructure.

Comments closed

Checking File Sizes In SQL Server

Andy Mallon looks back at a contribution by Junior DBA Andy, this one on checking file sizes:

This is every DBA’s favorite game. Figuring out what DMV contains the data you want. It turns out there are two places that database file size info is maintained. Each database has sys.database_files which has information for that database. The master database also has sys.master_files, which contains information for every database.

Using sys.master_files seems like it would be the obvious choice: everything in one view in master is going to be easier to query than hitting a view in a bunch of different databases. Alas, there’s a minor snag. For tempdb, sys.master_files has the initial file size, but not the current file size. This is valuable information, but doesn’t answer the use cases we set out above. I want to know the current file sizes. Thankfully, sys.database_files has the correct current file sizes for tempdb, so we can use that.

Using sys.database_files seems like it’s going to be the right move for us then. Alas, this isn’t quite perfect either. With Log Shipping, and Availability Group secondaries, if you’ve moved data files to a different location, sys.database_files will contain the location of the files on the primary database. Thankfully, sys.master_files has the correct local file locations for user databases, so we can use that.

Ugh, so it looks like the answer is “both”… we’ll need to use sys.database_files for tempdb, and sys.master_files for everything else.

Click through for the script, including Andy’s critical reflection on how Past Andy has failed him.

Comments closed

Adding IN Search Functionality To .NET

Jay Robinson shows off a few extension methods he creates to make dealing with C# easier:

Then I could use the extension like this:

if (mySeries.In(Enum.Series.ProMazda, Enum.Series.Usf2000)) myChassis = "Tatuus";

As for the other two methods, well… When is a null not a null? When it’s a System.DBNull.Value, of course! SQL Server pros who have spent any time in the .NET Framework will recognize this awkwardness:

var p = new System.Data.SqlClient.SqlParameter("@myParam", System.Data.SqlDbType.Int);
p.Value = (object)myVar ?? System.DBNull.Value;

With the extension, the second line becomes:

p.Value = mVar.ToDbNull();

I like it that Jay ended up going with a different language than T-SQL.  It’s no F#, but it’ll do.

Comments closed

Finding Low-Hanging Fruit When Tuning SQL Server

Kevin Hill has a couple scripts which help him find easy performance gains:

Recently, I’ve been getting a lot of performance tuning work, much of which is basically “things are slow…can you fix them?” type of requests.  Most experienced DBAs know that there a few gazillion factors that can lead to this request, and I won’t re-hash them here.

Lets assume, that we’ve optimzed or eliminated the server, disks, network, etc. and were now looking at SQL code and everyone’s favorite – indexes.

I have two scripts I use that give me a quick overview of the type of work SQL Server is doing.   These complement each other, and are used AS A STARTING POINT to locate the low-hanging fruit that can be causing excessive work for the disks and memory of the server.

Click through for those scripts.

Comments closed

Building Recurrent Neural Networks Using TensorFlow

Ahmet Taspinar walks us through creating a recurrent neural network topology using TensorFlow:

As we have also seen in the previous blog posts, our Neural Network consists of a tf.Graph() and a tf.Session(). The tf.Graph() contains all of the computational steps required for the Neural Network, and the tf.Session is used to execute these steps.

The computational steps defined in the tf.Graph can be divided into four main parts;

  1. We initialize placeholders which are filled with batches of training data during the run.

  2. We define the RNN model and to calculate the output values (logits)

  3. The logits are used to calculate a loss value, which then

  4. is used in an Optimizer to optimize the weights of the RNN.

As a lazy casual, I’ll probably stick with letting Keras do most of the heavy lifting.

Comments closed