Press "Enter" to skip to content

Author: Kevin Feasel

Grouping And Binning

Reza Rad discusses a couple new additions to Power BI, grouping and binning:

Binning is grouping a numeric field based on a division. This type of grouping is called Banding as well. For example you might have customers with different yearlyIncome range from $10,000 to $100,000 and you want to create a banding by $25,000. This will generate 4 groups of yearly income for you. This is exactly what Binning in Power BI does. Let’s look at the example.

Create a Table in Power BI Report and visualize YearlyIncome (from DimCustomer), and SalesAmount (from FactInternetSales) in it. Change the aggregation of YearlyIncome from Sum to Do Not Summarize as below

You could already build this yourself, but I’m glad they introduced this, as it’s an easier solution.

Comments closed

Logical Windowing

Lukas Eder discusses window functions:

Now, let’s assume I’m interested in these things:

  1. How many payments were there in the same hour as any given payment?
  2. How many payments were there in the same hour before any given payment?
  3. How many payments were there within one hour before any given payment?

Those are three entirely different questions.

Lukas’s solution uses Oracle syntax, but most of it also applies to SQL Server 2012 and higher.  The part that doesn’t apply, unfortunately, is the RANGE BETWEEN INTERVAL, which allows you to find values clustered in the same time period (one hour in his example).

Comments closed

Hypothetical Indexes

Kenneth Fisher discusses hypothetical indexes:

I saw something like this the other day. My first thought was “Hu, never seen that before.” My second thought was “Wow, that’s really cool. I wonder what a hypothetical index is?” A quick search later and I discovered that the DTA (database tuning adviser) uses them to test out what indexes will work best. A pretend (one might almost say hypothetical) index is created, with statistics, but without the actual index structure. Then a query plan is created allowing for that index.

This is pretty cool since creating a real index can take quite a bit of time, particularly on a really large table. It would be nice to be able to tell SQL that an index exists and try it out before actually spending the time creating it. I’d learned about a DB2 method of doing this a while back but wasn’t aware of one for SQL Server. In part that’s because it’s undocumented. Because the commands I’m going to use here are undocumented standard warnings apply.

That’s completely new to me.

Comments closed

Scheduler Stories

Ewald Cress has a couple of posts about the scheduler.  First, fiber mode scheduling:

The title of this post is of course an allusion to Ken Henderson’s classic article The perils of fiber mode, where he hammers home the point that fiber scheduling, a.k.a. lightweight pooling, appears seductive until you realise what you have to give up to use it.

We’ll get to the juicy detail in a moment, but as a reminder, the perils of fibers lie in their promiscuity: many fibers may share one thread, its kernel structures and its thread-local storage. This is no problem for code that was written with fibers in mind, including all of SQLOS, but unfortunately there are bodies of code for which this isn’t true.

Next up is the Windows scheduler:

Hardware interrupts, which run in kernel mode and return to user mode quickly, should be nothing more than tiny hiccups in a running thread’s quantum. The other 90% of the interrupt iceberg manifests in user mode as Deferred Procedure Calls (DPCs, or “bottom halves” to the Linux crowd) but should still only steal small change in terms of CPU cycles. Context switches to another thread represent a completely different story, because it could be ages before control returns to our thread, meaning that our fiber scheduler is completely out of commission for a while.

This possibility – a SQLOS scheduler losing the CPU for an extended period – is just one of those things we need to live with, but on a sane server, it shouldn’t be something to be too concerned about. Consider that this happens all the time in virtualised environments, where our vCPU can essentially cease to exist while another VM has a ride on the physical CPU.

These are fairly long reads, but we’re getting to levels where you can see these settings in the Database Engine (like Lightweight Pooling).

Comments closed

Query Store Filegroups

Kendra Little links to a Connect item:

Can you change the filegroup where Query Store keeps its data?

I thought there might be a trick to use a different filegroup for Query Store by using the default filegroup setting in SQL Server before enabling it, but NOPE!

Please vote for this to be improved in this Connect Item.

I concur; Query Store can grow to be pretty large on busy systems, so diligent DBAs who want to keep PRIMARY as small as possible will suddenly find a multi-gigabyte Query Store slowing down those PRIMARY filegroup restores.

Comments closed

Using SQL Server DMA

Arun Sirpal has a series on using the new Data Migration Assistant.  Part 1:

It “enables you to upgrade to a modern data platform by detecting compatibility issues that can impact database functionality on your new version of SQL Server. It recommends performance and reliability improvements for your target environment. It allows you to not only move your schema and data, but also uncontained objects from your source server to your target server”. It can be found at this link:https://www.microsoft.com/en-us/download/details.aspx?id=53595.

Part 2 is all about performing a migration:

By the way the backup file created via the tool is temporary, after a migration it is deleted. Also the compatibility level DOES NOT change, you need to do this yourself.

I haven’t used this tool yet, but it does look like an upgrade to the old Upgrade Advisor.

Comments closed

Thinking About Compile Time

Jay Robinson has a post on compilation time and especially indexed views:

What I found was that worker time needed to compile these queries is indistinguishable from that needed to execute them. To show this, let’s look at an example in AdventureWorks2014. In this example, I’m going to create and execute two similar procedures. I’m also going to create a number of indexed views.

Why indexed views? I want to increase compile time significantly for this exercise, and a large number of indexed views can do that. From MSDN: “The query optimizer may use indexed views to speed up the query execution. The view does not have to be referenced in the query for the optimizer to consider that view for a substitution.” My thanks tooas_public on stackoverflow.com for that tip.

Indexed views come at a cost, as Jay shows.

Comments closed

Azure SQL DW Statistics

Emma Stewart looks at how statistics are created in Azure SQL Data Warehouse:

In Azure SQL Data Warehouse, statistics have to be created manually. On previous SQL Server projects, creating and maintaining statistics wasn’t something that we had to incorporate into our design (and really think about!) however with SQL DW we need to make sure we think about how to include it in our process in order to make sure we take advantage of the benefits of working with Azure DW.

The major selling point of Azure SQL Data Warehouse is that it is capable of processing huge volumes of data, one of the specific performance optimisations that has been made is the distributed query optimiser. Using the information obtained from the statistics (information on data size and distribution), the service is able to optimize queries by assessing the cost of specific distributed query operations. Therefore, since the query optimiser is cost-based, SQL DW will always choose the plan with the lowest cost.

Azure SQL Data Warehouse is a bit of a strange animal, with differences in statistics being one of the smaller changes versus “classic” SQL Server.

Comments closed

Details On Azure SSAS

James Serra breaks down what Azure Analysis Services has to offer:

  • Developers can use SQL Server Data Tools (SSDT) in Visual Studio for creating models and deploying them to the service.  Administrators can manage the models using SQL Server Management Studio (SSMS) and investigate issues using SQL Server Profiler

  • Business users can consume the models in any major BI tool.  Supported Microsoft tools include Power BI, Excel, and SQL Server Reporting Services.  Other MDX compliant BI tools can also be used, after downloading and installing the latest drivers

  • The service currently supports tabular models (compatibility level 1200 only).  Support for multidimensional models will be considered for a future release, based on customer demand

Between tabular-only support and the max size being 100 GB (if I’m reading this correctly), they’re not yet ready to push the product hard.  Given that it just came out, that makes sense, and hopefully the training wheels come off.

Comments closed

Sparklines

Devin Knight continues his custom visuals series with the sparkline:

Key Takeaways

  • Shows trends in data most often by time.

  • It can only visualize a single measure.

  • The Sparkline can be configured to animate across time.

In the right circumstances, I love sparklines.  My circumstances are as follows:  when you are viewing time series data for relatively few elements in which the trend is more important than the levels.  In that scenario, sparklines are efficient and tell the story without extraneous numbers or clutter getting in the way.

Comments closed