The Non-Blocking Segment Operator

Hugo Kornelius notes a documentation bug with the Segment operator:

The Segment operator, like all operators, is described at the Books Online page mentioned above. Here is the description, quoted verbatim:

Segment is a physical and a logical operator. It divides the input set into segments based on the value of one or more columns. These columns are shown as arguments in the Segment operator. The operator then outputs one segment at a time.

Looking at the properties of the Segment operator, we do indeed see the argument mentioned in this description, in the Group By property (highlighted in the screenshot). So this operator reads the data returned by the Index Scan (sorted by TerritoryID, which is required for it to work; this is why the Index Scan operator is ordered to perform an ordered scan), and divides it into segments based on this column. In other words, this operator is a direct implementation of the PARTITION BY spefication. Every segment returned by the operator is what we would call a partition for the ROW_NUMBER() function in the T-SQL query. And this enables the Sequence Project operator to reset its counters and start at 1 for every new segment / partition.

Read on to understand the issue and see Hugo’s proof.

Filtering Class Type In Server Audits

Solomon Rutzky shows how to filter the class_type field in a SQL Server audit to filter out scalar valued functions:

According to the documentation for CREATE SERVER AUDIT, I should be able to add a WHERE clause (starting in SQL Server 2012) to do simple filtering. The documentation states that the list of fields that can be filtered on is found in the documentation for the [sys.fn_get_audit_file] system function. Selecting from that function showed that the class_type field contains the “object type” (“FN” = Function, “P” = Stored Procedure, etc.) value. That’s just what I needed. So I’m just about done, right? Not so fast!

Nothing’s ever that easy, it seems.  Read on for the full story.

Mislabeled Column In dm_os_sys_memory

Lonny Niederstadt points out that the definition of a column in the sys.dm_os_sys_memory DMV is incorrect:

Based on the column names and values above, seems natural to think:
total_page_file_kb – available_page_file_kb = used page file kb
11027476 kb – 3047668 kb = 7979808 kb

Holy cow! Is my laptop using nearly as much paging space as there is RAM on the laptop??
Weird. If something forced that much paging space use relative to RAM on the laptop… I certainly wouldn’t expect system_memory_state_desc = ‘Available physical memory is high’!!

Read on for an explanation of what the columns actually mean.

Creating Powershell Documentation In VS Code

Rob Sewell has a post covering a nice addition to Visual Studio Code when you’re building Get-Help documentation for a cmdlet:

Now you can simply type <# and your help will be dynamically created. You will still have to fill in some of the blanks but it is a lot easier.

Here it is in action in its simplest form

But it gets better than that. When you add parameters to your function code they are added to the help as well. Also, all you have to do is to tab between the different entries in the help to move between them

Looks like a nice time-saver.

Log Info DMF

Andrew Pruski looks at the new sys.dm_db_log_info dynamic management function in SQL Server 2017:

This new DMF is intended to replace the (not so) undocumented DBCC LOGINFO command. I say undocumented as I’ve seen tonnes of blog posts about it but never an official Microsoft page.

This is great imho, we should all be analyzing our database’s transaction log and this will help us to just that. Now there are other undocumented functions that allow us to review the log (fn_dblog and fn_dump_dblog, be careful with the last one).

I’m glad to see this make the cut, as replacing informative / idempotent DBCC commands with DMVs makes it easier to query this data, particularly in conjunction with other DMVs or system tables.

Seeing Statistics In Execution Plans

Pedro Lopes announces that the statistics used to compile a plan are now available as part of the execution plan details:

OptimizerStatsUsage is available in cached plans, so getting the “estimated execution plan” and the “actual execution plan” will have this information.

In the above example, I see the ModificationCount is very high (almost as much as the table cardinality itself) which after closer observation, the statistic had been updated with NORECOMPUTE.

And looking and the Seek itself, there is a large skew between estimated and actual rows. In this case, I now know a good course of action is to update statistics. Doing so produces this new result: ModificationCounter is back to zero and estimations are now correct.

This will be a good addition to SQL Server 2017.

Code Beautifier

Dave Mason has a small application for auto-formatting code:

One of the challenges I’ve faced as a blogger is quickly reproducing code that looks good in HTML. I’ve tried a few different online code conversion sites, and even a C# library. But I never quite got the results I wanted. I found myself revising the HTML to fix up key words, comments, and operators that sometimes got missed, were the wrong color, or the wrong font. All that HTML editing was tedious and time consuming.

What I really wanted to do was copy my code from SSMS, Visual Studio, or the PowerShell ISE and paste it into my blog so I could spend less time as a web dev and more time writing. Since none of the other tools I found gave me the results I wanted, I wrote my own. The code for my little app is available on GitHub. (It’s my first project, so if something is missing or wrong, let me know.) It was created with Visual Studio 2015–there’s a compiled exe if you prefer that.

Check it out.

Code Formatting

Bert Wagner has a few tips on code formatting to make it more readable:

The second example above consistently indents lines, adds new lines, and follows consistent coding patterns. This makes it easy to skim the code quickly.

Books have chapters, headings, and paragraphs defined by formatting that make it easy to find what is needed at a glance — formatting code makes it possible to find things easily too.

The examples Bert uses are all C#, but apply to most languages.  I think consistency is key, even more so than your ideal format.  This reduces friction between developers, at least outside of the “what should our coding standards be?” meetings…

Stop Being Your Own Worst Enemy With Code

Bert Wagner has advice on making code understandable for future-you:

At the time I wrote it, I probably thought my code was beautiful. An elegant masterpiece. It should have been printed, framed, and hung on a wall of The Programming Hall of Fame. As clever as I thought I may have been a few years ago, I rarely am able to read my old code without some serious time wasted debugging.

This problem plagued me regularly. I tried different techniques to try and make my code easier to understand.

Bert has some good thoughts here, and I’ll add two small bits.  First, there’s a saying that it takes more mental effort to debug code than it takes to write it, so if you’re writing code at the edge of your understanding, effective debugging becomes difficult to impossible.  Second, unless you see a business rule frequently enough to internalize it, your greatest familiarity with the “whys” of the system is right when you are developing.  There is huge value in taking the time to document the rules in an accessible manner; even if you wrote the code, you probably won’t remember that weird edge case at 4 AM six months from now, when you need to remember it the most.

Accepting Risk

Daniel Hutmacher argues that modern companies have reached an inefficient risk equilibrium:

Which brings us to the matter of getting stuff done. Imagine if everything you do has to be approved by a stakeholder and a manager, every line of code you write is peer-reviewed, then tested in a dev test environment, then in an acceptance test environment (which should both contain reasonably fresh, yet scrambled copies of the production data), then approved for deployment by the stakeholder (who ideally should also take time to verify the results), and finally deployed to production by two other people, under a four-eyes principle where no single person can perform any change in production alone. Sprinkle this with a bunch of project meetings, all while leaving a long and winding trail of tickets and documentation.

This is how most development cycles look. Except, you know, the test environments are rarely fresh, the tests aren’t really that thorough, and the peer-review could probably be called a peer-glance at best.

A lot of this depends upon the industry and the likelihood that an outage will cause direct physical harm to people.  I’m personally ambivalent about where the right risk acceptance point is, but Daniel makes a good argument on the “accept more risk” side.


April 2018
« Mar