Press "Enter" to skip to content

Author: Kevin Feasel

SQL Data Warehouse Distribution Keys

Simon Whiteley explains the different distribution key options available in Azure SQL Data Warehouse and SQL Server APS:

Each record that is inserted goes onto the next available distribution. This guarantees that you will have a smooth, even distribution of data, but it means you have no way of telling which data is on which distribution. This isn’t always a problem!

If I wanted to perform a count of records, grouped by a particular field, I can perform this on a round-robin table. Each distribution will run the query in parallel and return it’s grouped results. The results can be simply added together as a second part of the query, and adding together 60 smaller datasets shouldn’t be a large overhead. For this kind of single-table aggregation, round-robin distribution is perfectly adequate!

However, the issues arise when we have multiple tables in our query. In order to join two tables. Let’s take a very simple join between a fact table and a dimension. I’ve shown 6 distributions for simplicity, but this would be happening across all 60.

Figuring out which distribution key to use can make a huge difference in performance.

Comments closed

Full-Text Indexing Issues

Cody Konior looks at a couple errors in full-text indexing:

The first error message holds the key. It’s not enough to just enable the service, you need to run that command also (which will also start the service for you as well as doing something internally) or restart SQL Server after the service is enabled.

Here’s where we go to the next level. You now know that full-text indexing can have a problem on startup that isn’t visible externally but surely this could never happen to you, right? But I guarantee you that if you have 500 servers many of them currently have this issue. When a developer or application support person encounters it that’s going to take half an hour or an hour of investigation, then logging incidents, then passing it on to you, for another half hour or hour of investigation, and then more time, and more time, and then doing a root cause analysis, and then sharing it with your team, and then…

Wouldn’t it be so much easier to just write a validation test for this now so that if this ever occurs you can just proactively go fix it? Well you can!

It’s a good read.

Comments closed

Multiple Flat Files SSIS Connector

John Morehouse shows that there are more SSIS connection managers than meets the eye:

While recently examining an existing SSIS package  before implementing some needed changes I noticed the connection manager for multiple flat files.

Normally, when processing multiple files, I use a sequence container and loop through them.  It’s usually a cumbersome process to setup and you must configure variables.  I was not aware the multiple flat file type of connection manager existed. No wonder because it’s notwithin the normal list.  This handy thing could only be found when I looked under “New Connection”.

Click through for more details.

1 Comment

Perfmon Counters For Monitoring AGs

Tracy Boggiano has a set of Perfmon counters she uses to monitor Availability Groups:

Monitoring Availability Groups can be tricky.  The DMVs don’t update the log_send_rate and redo_rate all the time especially if nothing is happening so if you try to use those for calculations when monitoring you could false results worse yet pages in the middle of the night.  In order to calculate the log_send_rate and redo_rate you need to capture the perfmon counters ‘Log Bytes Flushed/sec’, ‘Redone Bytes/sec’, and ‘Log Bytes Received/sec’ into temporary tables WAITFOR 1 second then capture them again.  Below is query that captures this along side what you see in the DMV for when Microsoft might fix the issue.

Click through for a script as well as a Powershell cmdlet wrapper for running against a set of hosts in your Central Management Server.

Comments closed

Re-Provisioning Volumes In Azure

Robert Bishop shows how to create data disks in Azure and attach them to VMs:

Recently a client found this article on “Best Practices for SQL Server in Azure Virtual Machines” and wanted to re-provision his volumes to adhere to them.

No my first thoughts was wait, I’m a DBA, not a System Admin that’s not my role! But thinking more about it I realized the client views this as a SQL Server issue and I am the SQL Server Consultant and that it is my job to remedy this problem.

Not being 100% confident in Azure, I spun up a VM SQL Server and attempted to add some volumes.  To my surprise, this was way too easy.

Click through for the steps.

Comments closed

Interlinked Parameters In Powershell

Rob Sewell shows how to make one parameter’s valid set of values depend upon a previous parameter’s selected value:

But what Sander wanted was to validate the value of the second parameter depending on the value of the first one. So lets say we wanted

  • If word is sun, number must be 1 or 2
  • If word is moon, number must be 3 or 4
  • If word is earth, number must be 5 or 6

We can use the ValidateScriptAttribute  to do this. This requires a script block which returns True or False. You can access the current parameter with $_ so we can use a script block like this

Any Powershell post with a link to June Blender is automatically 5% better in my book.

Comments closed

Stopwatches

Drew Furgiuele explains how to use the .NET Stopwatch class in Powershell:

We can see I cleared the history buffer of my session ( Clear-History ), then ran a script. It’s nothing fancy; just connecting to my local instance of SQL Server and outputting a list of tables to a text file. With Get-History , I can see every command I put in the buffer, and using expressions I can calculate how long a command took.

And that’s great and all, but that’s the entire execution time. If there are multiple steps to your function, how long does each step take? If your script execution is 186 seconds, how much of time is spent on a database query? What about a loop? How long does each iteration take? Are you writing to a network share, and want to know what the latency is? Get-History  isn’t the tool of this, but thankfully we have other methods.

I use the stopwatch a lot for similar things; it’s a useful tool.

Comments closed