Press "Enter" to skip to content

Author: Kevin Feasel

Reusing U-SQL Scripts

Matthew Hicks shows how to use Powershell to parameterize U-SQL scripts:

You can use this feature either via Azure Cloud Shell or on a Windows machine with Azure PowerShell installed.

When submitting, simply construct a hashtable of U-SQL variable names to values and pass it in using the -ScriptParameter cmdlet parameter. The .NET type of each value in the hashtable is used when defining the variable in U-SQL.

Supported types include:

byte, sbyte, int, uint (or uint32), long, ulong (or uint64), float, double, decimal, short (or int16), ushort (or uint16), char, string, DateTime, bool, Guid, or byte[]

Read on for an example of the process.

Comments closed

Setting Up A Test Lab Domain Controller

David Fowler has a new series on building a test lab, starting with a domain controller:

One of the most useful tools to the DBA when we need to test new features, recreate a fault that we’ve seen in production or just want to see ‘what if…?’ is a test lab.

Some of you are going to be lucky enough to have a few servers kicking around or a chunk of the virtual environment that you can build a test lab in but not all of us do.  In this series of posts I’m going to look at how we can build up a fully functioning test lab consisting of a domain and clustered SQL Servers on our desktop PC.  Now, although I’m going to be building this environment on my desktop, the main steps will be the same if you’ve got separate hardware for this so may still be relevant.

So, in this series we’re going to build a virtual test lab that’s going to consist of a domain controller and a couple of SQL Servers in a Windows Failover Cluster, hosting an Availability Group.

Read on for a step-by-step guide using Virtualbox to build these VMs.

Comments closed

New SQL Operations Studio

Alan Yu announces the December release of SQL Operations Studio:

Download SQL Operations Studio and review the Release Notes to get started.

SQL Operations Studio is a data management tool that enables you to work with SQL Server, Azure SQL DB and SQL DW from Windows, macOS and Linux. To learn more, visit our GitHub.

While it’s downloading, check out the new bits and also Marek Masko’s guide on SQL Operations Studio:

If you find Go to Definition feature very useful, but you would like to have a more temporary view of objects source code, then no problem! SQL Ops Studio delivers second functionality which gives you the possibility to check object definition. This time, the source code is displayed in the same editor tab.

Marek’s post went up before the December release, so one big point (about not being able to see actual execution plans) is now fixed.

Comments closed

Error Handling In Powershell In SQL Agent Jobs

Ben Miller has a couple of tips when executing Powershell within SQL Agent jobs:

What you don’t see is the way you have the job step succeed or fail. When using most commands in modules, you may find that errors still cause the step to fail because of the way they report the failure (some kind of throw or a Stop condition outside your control). So if you want things to fail that normally would show red on the screen but things would continue, remember that the default ErrorAction is Continue, so even though you get an error, PowerShell will just continue.

Read on for more.

Comments closed

Fragmentation Can Affect Execution Plans

Jonathan Kehayias explains how index fragmentation can potentially affect execution plans:

Index fragmentation removal and prevention has long been a part of normal database maintenance operations, not only in SQL Server, but across many platforms. Index fragmentation affects performance for a lot of reasons, and most people talk about the effects of random small blocks of I/O that can happen physically to disk based storage as something to be avoided. The general concern around index fragmentation is that it affects the performance of scans through limiting the size of read-ahead I/Os. It’s based on this limited understanding of the problems that index fragmentation cause that some people have begun circulating the idea that index fragmentation doesn’t matter with Solid State Storage devices (SSDs) and that you can just ignore index fragmentation going forward.

However, that is not the case for a number of reasons. This article will explain and demonstrate one of those reasons: that index fragmentation can adversely impact execution plan choice for queries. This occurs because index fragmentation generally leads to an index having more pages (these extra pages come from page split operations, as described in this post on this site), and so the use of that index is deemed to have a higher cost by SQL Server’s query optimizer.

Let’s look at an example.

Check out the example, but definitely read the comments as there are some good conversations in there.

Comments closed

Pipes And More Pipes In R

Gabriel (de Selding?) has a tutorial on how to use the various pipes in R:

In F#, the pipe-forward operator |> is syntactic sugar for chained method calls. Or, stated more simply, it lets you pass an intermediate result onto the next function.

Remember that “chaining” means that you invoke multiple method calls. As each method returns an object, you can actually allow the calls to be chained together in a single statement, without needing variables to store the intermediate results.

In R, the pipe operator is, as you have already seen, %>%. If you’re not familiar with F#, you can think of this operator as being similar to the +in a ggplot2 statement. Its function is very similar to that one that you have seen of the F# operator: it takes the output of one statement and makes it the input of the next statement. When describing it, you can think of it as a “THEN”.

Auto-recommended for the F# love, and a good tutorial to boot.

John Mount has a few interesting notes on the topic:

Read on for the rest of his notes, too.

Comments closed

Moving From reshape2 To tidyr

Martin Johnsson talks about a couple tricky bits when moving from reshape2 to tidyr:

In practice, I don’t think people always take their data frames all the way to tidy. For example, to make a scatterplot, it is convenient to keep a couple of variables as different columns. The key is that we need to move between different forms rapidly (brain time-rapidly, more than computer time-rapidly, I might add).

And not everything should be organized this way. If you’re a geneticist, genotypes are notoriously inconvenient in normalized form. Better keep that individual by marker matrix.

The first serious piece of R code I wrote for someone else was a function to turn data into long form for plotting. I suspect plotting is often the gateway to tidy data. The function was like what you’d expect from R code written by a beginner who comes from C-style languages: It reinvented the wheel, and I bet it had nested for loops, a bunch of hard bracket indices, and so on. Then I discovered reshape2.

I’d not used reshape2 before, having started with tidyr, so it was interesting to see the contrast.

Comments closed

Data Manipulation In R

Steph Locke has a new book out:

Data Manipulation in R is the second book in my R Fundamentals series that takes folks from no programming knowledge through to an experienced R user. Working with Rfocussed on the very basics and this second book covers data wrangling.

Introducing the core skill of working with tabular data, this book goes from importing data, to analysing it, and then getting it back out for consumption elsewhere. Leaning heavily on the tidyverse, I think it’s an accessible introduction for those new to analysing data.

The book is free if you have Kindle Unlimited; otherwise, it’s not expensive at all.

Comments closed

Python Data Frames In ML Services

Robert Sheldon continues his SQL Server Machine Learning Services series by looking at Python data frames:

This article focuses on using data frames in Python. It is the second article in a series about MLS and Python. The first article introduced you briefly to data frames. This article continues that discussion, describing how to work with data frame objects and the data within those objects.

Data frames and the functions they support are available to MLS and Python through the pandas library. The library is available as a Python module that provides tools for analyzing and manipulating data, including the ability to generate data frame objects and work with data frame data. The pandas library is included by default in MLS, so the functions and data structures available to pandas are ready to use, without having to manually install pandas in the MLS library.

There’s quite a bit to this article, making it an interesting read.

Comments closed

Multi-Object JSON Arrays In SQL Server

Bert Wagner shows how to build JSON arrays in SQL Server:

When using FOR JSON PATH, ALL rows and columns from that result set will get converted to a single JSON string.

This creates a problem if, for example, you want to have a column for your JSON string and a separate column for something like a foreign key (in our case, HomeId). Or if you want to generate multiple JSON strings filtered on a foreign key.

The way I chose to get around this is to use CROSS APPLY with a join back to our Home table — this way we get our JSON string for either Cars or Toys created but then output it along with some additional columns.

Impedance mismatch?  What impedance mismatch?

Comments closed