Press "Enter" to skip to content

Curated SQL Posts

New SQL Operations Studio

Alan Yu announces the December release of SQL Operations Studio:

Download SQL Operations Studio and review the Release Notes to get started.

SQL Operations Studio is a data management tool that enables you to work with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux. To learn more, visit our GitHub.

While it’s downloading, check out the new bits and also Marek Masko’s guide on SQL Operations Studio:

If you find Go to Definition feature very useful, but you would like to have a more temporary view of objects source code, then no problem! SQL Ops Studio delivers second functionality which gives you the possibility to check object definition. This time, the source code is displayed in the same editor tab.

Marek’s post went up before the December release, so one big point (about not being able to see actual execution plans) is now fixed.

Comments closed

Error Handling In Powershell In SQL Agent Jobs

Ben Miller has a couple of tips when executing Powershell within SQL Agent jobs:

What you don’t see is the way you have the job step succeed or fail. When using most commands in modules, you may find that errors still cause the step to fail because of the way they report the failure (some kind of throw or a Stop condition outside your control). So if you want things to fail that normally would show red on the screen but things would continue, remember that the default ErrorAction is Continue, so even though you get an error, PowerShell will just continue.

Read on for more.

Comments closed

Fragmentation Can Affect Execution Plans

Jonathan Kehayias explains how index fragmentation can potentially affect execution plans:

Index fragmentation removal and prevention has long been a part of normal database maintenance operations, not only in SQL Server, but across many platforms. Index fragmentation affects performance for a lot of reasons, and most people talk about the effects of random small blocks of I/O that can happen physically to disk based storage as something to be avoided. The general concern around index fragmentation is that it affects the performance of scans through limiting the size of read-ahead I/Os. It’s based on this limited understanding of the problems that index fragmentation cause that some people have begun circulating the idea that index fragmentation doesn’t matter with Solid State Storage devices (SSDs) and that you can just ignore index fragmentation going forward.

However, that is not the case for a number of reasons. This article will explain and demonstrate one of those reasons: that index fragmentation can adversely impact execution plan choice for queries. This occurs because index fragmentation generally leads to an index having more pages (these extra pages come from page split operations, as described in this post on this site), and so the use of that index is deemed to have a higher cost by SQL Server’s query optimizer.

Let’s look at an example.

Check out the example, but definitely read the comments as there are some good conversations in there.

Comments closed

Pipes And More Pipes In R

Gabriel (de Selding?) has a tutorial on how to use the various pipes in R:

In F#, the pipe-forward operator |> is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.

Remember that “chaining” means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.

In R, the pipe operator is, as you have already seen, %>%. If you’re not familiar with F#, you can think of this operator as being similar to the +in a ggplot2 statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a “THEN”.

Auto-recommended for the F# love, and a good tutorial to boot.

John Mount has a few interesting notes on the topic:

Read on for the rest of his notes, too.

Comments closed

Moving From reshape2 To tidyr

Martin Johnsson talks about a couple tricky bits when moving from reshape2 to tidyr:

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

I’d not used reshape2 before, having started with tidyr, so it was interesting to see the contrast.

Comments closed

Data Manipulation In R

Steph Locke has a new book out:

Data Manipulation in R is the second book in my R Fundamentals series that takes folks from no programming knowledge through to an experienced R user. Working with Rfocussed on the very basics and this second book covers data wrangling.

Introducing the core skill of working with tabular data, this book goes from importing data, to analysing it, and then getting it back out for consumption elsewhere. Leaning heavily on the tidyverse, I think it’s an accessible introduction for those new to analysing data.

The book is free if you have Kindle Unlimited; otherwise, it’s not expensive at all.

Comments closed

Python Data Frames In ML Services

Robert Sheldon continues his SQL Server Machine Learning Services series by looking at Python data frames:

This article focuses on using data frames in Python. It is the second article in a series about MLS and Python. The first article introduced you briefly to data frames. This article continues that discussion, describing how to work with data frame objects and the data within those objects.

Data frames and the functions they support are available to MLS and Python through the pandas library. The library is available as a Python module that provides tools for analyzing and manipulating data, including the ability to generate data frame objects and work with data frame data. The pandas library is included by default in MLS, so the functions and data structures available to pandas are ready to use, without having to manually install pandas in the MLS library.

There’s quite a bit to this article, making it an interesting read.

Comments closed

Multi-Object JSON Arrays In SQL Server

Bert Wagner shows how to build JSON arrays in SQL Server:

When using FOR JSON PATH, ALL rows and columns from that result set will get converted to a single JSON string.

This creates a problem if, for example, you want to have a column for your JSON string and a separate column for something like a foreign key (in our case, HomeId). Or if you want to generate multiple JSON strings filtered on a foreign key.

The way I chose to get around this is to use CROSS APPLY with a join back to our Home table — this way we get our JSON string for either Cars or Toys created but then output it along with some additional columns.

Impedance mismatch?  What impedance mismatch?

Comments closed

Parallel CHECKDB

Arun Sirpal shows us when DBCC CHECKDB can go parallel:

DBCC CHECKDB has the ability to perform parallel checking of objects. However, it absolutely depends on the edition of SQL Server, it only happens when using enterprise edition.

Let’s see this in action. I propose the following tests for this blog post:

  • Test on a SQL Server Enterprise Edition.
  • Test on a non-enterprise edition of SQL Server.

I don’t have 2017 Enterprise at hand but I do have 2014 Enterprise and Express handy so it makes sense to use these versions for my “experiment”.

Read on to see the difference.

Comments closed

DataRow To JSON With Powershell

Rob Sewell shows how to convert a .NET DataRow into its JSON form using Powershell:

I wanted to be able to Mock $variable. I wrapped the code above in a function, let’s call it Run-Query

Which meant that I could easily separate it for mocking in my test. I ran the code and investigated the $variable variable to ensure it had what I wanted for my test and then decided to convert it into JSON using ConvertTo-Json

Read on to see the fun mess that ConvertTo-Json made and then Rob’s simplification.

Comments closed