Launching A Sparklyr Cluster

David Smith shows how to launch a sparklyr cluster in Azure:

When you’re finished, shut down your cluster using the aztk spark cluster delete command. (While you can delete the nodes from the Pools view in the Azure portal, the command does some additional cleanup for you.) You’ll be charged for each node in the cluster at the usual VM rates for as long as the cluster is provisioned. (One cost-saving option is to use low-priority VMs for the nodes, for savings of up to 90% compared to the usual rates.)

That’s it! Once you get used to it, it’s all quick and easy — the longest part is waiting for the cluster to spin up in Step 5. This is just a summary, but the full details see the guide SparklyR on Azure with AZTK.

It’ll take a bit more than five minutes to get started, but it is a good sight easier than building the servers yourself.

Flotilla: Containerized Task Execution

Akshay Wadia, et al, introduce a new open source tool:

Data scientists are not always equipped with the requisite engineering skills to deploy robust code to a production job execution and scheduling system. Yet, forcing reliance on data platform engineers will impede the scientists’ autonomy. If only there was another way.

Today we’re excited to introduce Flotilla, our latest open source project. Flotilla is a human friendly service for task execution. It allows you to focus on the work you’re doing rather than how to do it. In other words, Flotilla takes the struggle out of defining and running containerized jobs.

It looks like an interesting service.

dbatools Making Extended Events Easier

Chrissy LeMaire solves a bunch of common problems with Extended Events:

So why do people keep using traces? We compiled a list of reasons from Erin Stellato’s Why do YOU avoid Extended Events. And this list is LONG!

  • Traces are straightforward and less complex than Extended Events
    Totally seems that way!
  • Traces provide a consistent interface for mixed environments
    Whether you use SQL Server 7 or SQL Server 2017, the interface is pretty much the same.
  • Traces are faster to setup quick traces
    Just open up Profiler, connect to a server, click a few times and you’re set.
  • “Extended Events are more efficient for the SQL Server engine, but not more efficient for the DBA”
    Love this quote.
  • People already have a library of Profiler templates
    Including me
  • Ignorance of XML / Querying all the generated XML is outrageous
    When I first saw what it takes to query Extended Events, I bailed immediately. I am not learning XPATH, ever.
  • Templates work remotely across all instances
    This is also true for Extended Events, but the commenter did not know that.
  • XEvents are persistent and must be stopped manually
    Once you close Profiler or restart SQL Server, all non-default traces will disappear. Extended Events will persist until you delete them.
  • Ability to import PerfMon data and look at Trace and PerfMon counter data at the same time
    Most people that use Profiler don’t seem to know about this feature but those who do LOVE it. You can read more at Brad McGehee’s Correlating SQL Server Profiler with Performance Monitor.

    Microsoft reportedly has no plans to provide this functionality.

  • Consistent user experience across SSAS and Database Engine
    Gotta take their word, I don’t use SSAS.
  • It’s easy to train others to use Profiler
    Imagine – if it’s easier to learn Profiler, it’ll be far easier to teach.
  • Traces can be easily replayed
    There are a number of Microsoft tools to replay traces, but none to replay Extended Events.
  • MS Premier Support still asks for traces
    Likely because they also have tools that they want to work across all supported versions, which still includes SQL Server 2008 R2.
  • xe_file_target_read_file is a CPU hog
    This wasn’t listed on Erin’s page but was told to me while I was performing my research.

Whewf! That’s a lot of compelling reasons not to make the switch. So let’s see how we can address each of them using PowerShell. All code listed here can be found at

Chrissy & co can’t solve all of them, but they solve a majority.  Click through to see how.

Temp Tables In Redshift

Derik Hammer has some notes on temporary tables in Amazon Redshift:

One difference between regular tables and temporary tables is how they are typically used. Temporary tables are session scoped which means that adding them into a process or report will probably cause them to be created multiple times. Temporary tables might be very similar to regular tables but most regular tables are not re-written into, every time they are queried.

The disk writes involved in populating the temporary table might be more expensive than the reads would be if you were to modify your query to include the logic into one, larger, query. The frequency of the report or process will be a factor into how much of a performance hit you get by using the temporary tables. If you are using temporary tables to make debugging a procedure easier or to enhance readability, make sure you understand the IO cost of performing writes and then reading that data back into a subsequent query.

Read on for more.

Query Store And CU4

Erin Stellato notes that if you ran SQL Server 2017 CU2 with Query Store, you want to upgrade and run a special script:

There is a Query Store fix in the latest Cumulative Updates of SQL Server 2017, and if you are keeping current with the latest cumulative updates, there’s a script you need to run if you’ve installed CU2 at any point.

First, if you are still on CU2, you might want to consider upgrading to CU4 (released February 17, 2018).

Second, if you previously had CU2 installed (and are now on CU3 or CU4), you want to make sure that you’ve run the script included with CU3 and CU4 release notes.

The script removes plans from Query Store that were captured when running CU2.

Read the whole thing and keep those servers patched.

Replication Update: Dynamic Parameter Loading

Sourabh Agarwal announces a new feature with SQL Server replication:

When a new agent user profile is created and assigned to an agent, relevant entries are added to the [dbo].[MSagent_profiles] and the [dbo].[MSagent_parameters] tables in the MSDB database on the Distributor server. The values in the tables are updated whenever a parameter value is updated for the Profile. These values are reloaded from the tables, whenever a replication agent is restarted. As part of the improvement a new profile parameter “ProfileReloadInterval” with a default value of 300 seconds, was added to the profiles. This parameter dictates the interval at which the agent reloads the values from the tables. If using the default value, the agent will try and reload the values from the MSDB tables every 5 minutes. If it encounters that any of the above-mentioned parameter values have changed, the new values will be used by the agents.

I don’t work with replication enough to really see the benefit, but I’m sure somebody is giddy about this.

Special Commands In mssql-cli

Kevin Feasel



Alan Yu announces an update to mssql-cli:

Special commands are meant to make your life easier. They are shortcuts to perform common tasks and queries. All special commands start with a backslash (\), and you can use the built-in IntelliSense to see a list of special commands you can use. You can also learn more by running the following command:

$ mssql-cli>\?

To start off, let’s say you have a common query you run often, but don’t want to type it out each time or scroll through your history. You can “save” your query by running the following special command:

$ mssql-cli>\sn address select * from "Person"."AddressType"

To execute this query, simply run:

$ mssql-cli>\n address

Click through to learn more and see it in action.

Kerberos And SQLCMD

Kenneth Fisher notes a behavior change for SQLCMD with SQL Server 2014 and later:

This error is seen when using a Windows Authenticated id and NTLM or an untrusted Kerberos connection. From what I understand, with Kerberos the client (SQL in this case) gets a ticket containing your credentials, it can then pass this ticket on to other servers that trust it to have authenticated you to be sure you are who you say you are. Note the mention of trust. This means that even if you are using Kerberos if the two servers aren’t in a trusted relationship then you will still see the error. And NTLM doesn’t have the option to do this at all (again, not sure why).

At this point it’s important to note that this is not a SQL Server error. You will see this error in other places where you bounce across multiple machines and aren’t using Kerberos with a trusted relationship. I’ve seen it when trying to access files before (it was a weird situation admittedly).

Read on to learn about the ramifications of this behavioral change and how you can prevent double-hop problems when running newer versions of SQLCMD.


February 2018
« Jan Mar »