Doomed Transactions

Michael Swart talks about doomed transactions:

So the procedure was complicated and it used explicit transactions, but I couldn’t find any TRY/CATCH blocks anywhere! What I needed was a stack trace, but for T-SQL. People don’t talk about T-SQL stack traces very often. Probably because they don’t program like this in T-SQL. We can’t get a T-SQL stack trace from the SQLException (the error given to the client), so we have to get it from the server.

Michael shows how to get stack trace information and provides some advice on the process (mostly, “don’t do what we did”).


Kenneth Fisher explains what the COPY_ONLY flag means for a backup:

So to put that in simple terms. I have a database, Test. I take a full backup, changes happen, I take a differential backup, changes happen, I take a differential backup, etc. Ignoring all of the log backups that are happening if the database is in FULL recovery of course.

Currently, both differentials contain everything that has happened between the time of the full backup and the time the differential was taken. But what happens if I take a second full backup between the first and second differential? Now, that second differential will only contain data between the second full backup and the differential.

Read on for more.

Animating Visuals In R

Tomaz Kastrun shows how to create animated charts in R using ggplot2:

In addition to R code, the ImageMagic program needs to be installed on your machine, as well. Also the speed, quality and many other parameters can be set, when creating animated gif.

Animated gif can be also included into your SSRS report, your Sharepoint site or any other site – like my blog 🙂 and it will stay interactive. In Power BI, importing animated gif as a picture, unfortunately will not work.

Be very careful with this, as not everything supports animated GIFs and you can make some really painful graphs if you try hard enough…

Session-Level Wait Stats

Arun Sirpal points out that SQL Server 2016 has a session-level wait stats DMV:

This tells me about the waits since my last reboot or since a manual reset of the stats. It’s probably why you should do at least time-based analysis or reset the wait stats before starting, that is if you are interested in something time specific or if you want to understand certain workloads at a given time.

So the other option is that you could go down the session level route. With the session based analysis I took the query and changed it slightly to query sys.dm_exec_session_waits_stats and also pull back the session_id that I am interested in.

I had no idea this was available, and it’s something that I’ve wanted for a very long time, so that’s excellent.

Setting Your Maximum Memory

Thomas Rushton provides a script to set max memory on a SQL Server instance:

The thing to do, ideally, is to configure the maximum server memory when you build the server; however, sometimes you walk into a place where there are many servers where this hasn’t been done, or are otherwise looking for a quick way to determine what the setting should be. Jonathan Kehayias of SQLSkills blogged about a sensible SQL Server Maximum memory calculation (in response to a post elsewhere about a really dodgy memory config advisor, but I’m not going to link to that…)

What I’ve done below is codify that knowledge into a nice friendly T-SQL query that you can run, below. It makes use of the sys.dm_os_sys_info DMV to get the memory physically in the server; that DMV, though, has changed form between SQL 2008R2 and SQL 2012, the new version reporting physical_memory_kb whereas the previous version had physical_memory_in_bytes. Hence a bit of dynamic SQL nastiness at the start of the query.

Click through for the script, but make sure to tweak it for your environmental peculiarities.

Secure Enterprise Data Hub On Azure

James Morantus has a two-parter on Azure, Active Directory, and Cloudera’s enterprise data hub solution.  Part one hits on DNS and Samba:

As you can see, the hostname -f command displays a very long FQDN for my VM and hostname -i gives us the IP address associated with the VM. Next, I did a forward DNS lookup using the host FQDN command, which resolved to the IP address. Then, I did a reverse DNS lookup using host IPaddress as shown in the red box above, it did not locate a reverse entry for that IP address. A reverse lookup is a requirement for a CDH deployment. We’ll revisit this later.

Part two looks at tying everything together in the Azure portal as well as within AD:

The remaining steps must be executed as the Cloudera Director admin user you created earlier. In my case, that’s the “azuredirectoradmin” account. All resources created by Cloudera Director in the Azure Portal will be owned by this account. The “root” user is not allowed to create resources on the Azure Portal.

First, we’ll need to create a SSH key as the “azuredirectoradmin” user on the VM where Cloudera Director is installed. This key will be added to our deployment configuration file, which will be added on all the VMs provisioned by Cloudera Director. This will allow us to use passwordless SSH to the cluster nodes with this key.

This isn’t trivial, but considering all that’s going on, it’s rather straightforward.

Grabbing Spark With sbt

Kevin Feasel



Ian Hellström shows how to create an sbt script to get the a particular version of Spark:

If you have already installed sbt on your machine, read on. If not, have a look here on how to set up your machine.

With sbt available, create a folder in which you can play around, your ‘sandbox’. I’ll assume you have created the folder under /path/to/sandbox. On Windows, also create a sub-folder inside it for Spark’s so-called warehouse directory. Let’s call that sub-folder ‘warehouse’.

Click through for more details.

Querying The Histogram XE Target

Kendra Little shows how to query the histogram target for Extended Events:

I like using the histogram target because it’s relatively lightweight — you can “bucket” results by what you’re interested in. In my case, I was interested seeing the cumulative number of file_read events by file name.

But there’s one problem: the histogram target is stored in memory, not in a data file. If you want to query that data and store it off in a table, it’s not obvious how to do that.

Click through to figure out how to do that.

DILM Reports

Andy Leonard walks through executing an Integration Services package and then seeing the results in an SSRS report:

One of the first Data Integration Lifecycle Management (DILM) Suite solutions I built was Catalog Reports. Catalog Reports is a relatively simple and straightforward version of some of the SSIS Catalog Reports embedded in SSMS. The main difference is Catalog Reports is a SQL Server Reporting Services (SSRS) solution.

It’s free.

And it’s open source. Here’s a screenshot of the Overview Report for the same execution above

Check it out.

Word Count In Spark 2.0

Kevin Feasel



Anubhav Tarar has a word count app for Spark 2.0:

Now you have to perform the given steps:

  • Create a spark session from org.apache.spark.sql.sparksession api and specify your master and app name

  • Using the method, read from the file wordcount.txt the return value of this method in a dataset. In case you don’t know what a data set looks like you can learn from this link.

  • Split this dataset of type string with white space and create a map which contains the occurence of each word in that data set.

  • Create a class prettyPrintMap for printing the result to console.

This Hello World app is a bit longer than the sheer minimum code necessary, as it includes a class for formatting results and some error handling.


February 2019
« Jan