Disambiguating The Confusion Matrix

John Cook walks through a set of valuable terms derived from the core components of the confusion matrix:

How many terms are possible? There are four basic ingredients: TP, FP, TN, and FN. So if each term may or may not be included in a sum in the numerator and denominator, that’s 16 possible numerators and 16 denominators, for a total of 256 possible terms to remember. Some of these are redundant, such as one(a.k.a. ONE), given by TP/TP, FP/FP, etc. If we insist that the numerator and denominator be different, that eliminates 16 possibilities, and we’re down to a more manageable 240 definitions. And if we rule out terms that are the reciprocals of other terms, we’re down to only 120 definitions to memorize.

And of those, John points out the handful which are generally important, providing us an excellent table with definitions of commonly-used terms.

Thoughts On The Evolution Of Big Data

Praveen Sripati shares an opinion on where the various Hadoop and Big Data platforms are headed:

The different Cloud Vendors had been offering Big Data as a service for quite some time. Athena, EMR, RedShift, Kinesis are a few of the services from AWS. There are similar offerings from Google CloudMicrosoft Azure and other Cloud vendors also. All these services are native to the Cloud (built for the Cloud) and provide tight integration with the other services from the Cloud vendor.

In the case of Cloudera, MapR and HortonWorks the Big Data platforms were not designed with the Cloud into considerations from the beginning and later the platforms were plugged or force fitted into the Cloud. The Open Hybrid Architecture Initiative is an initiative by HortonWorks to make their Big Data platform more and more Cloud native.

It’ll be interesting to see where this goes.

Databricks Cluster-Scoped Init Scripts

Aayush Bhasin shares some background on a Databricks intern project, adding cluster-scoped initialization scripts to Databricks clusters:

One of the biggest pain points for customers used to be that init scripts for a cluster were not part of the cluster configuration and did not show up in the User Interface. Because of this, applying init scripts to a cluster was unintuitive, and editing or cloning a cluster would not preserve the init script configuration. Cluster-scoped init scripts addressed this issue by including an ‘Init Scripts’ panel in the UI of the cluster configuration page, and adding an ‘init_scripts’ field to the public API. This also allows init scripts to take advantage of cluster access control.

Read on to see how Aayush & co. solved this issue.

A Look At Automatic Plan Tuning In SQL Server 2017

John Sterrett shows us how to review automatic plan tuning suggestions:

SQL Server 2017 Automatic Tuning looks for queries where execution plans change and performance regresses. This feature depends on Query Store being enabled. Note, even if you don’t turn on Automatic Tuning you still get the benefits of having access to the data. That is right. Automatic Tuning would tell you what it would do if it was enabled.  Think of this as free performance tuning training.  Go look at the DMVs and try to understand why the optimizer would want to lock in an execution plan. We will actually go through a real-world example:

Click through for the example.  I wouldn’t automatically trust these automatic results, but my experience has been generally positive.

Customizing SSMS Using Powershell

Josh Simar shows us that you don’t need a GUI to customize SQL Server Management Studio:

There has been many a blog post out there that shows you some of the great ways you can customize the look and feel of your management studio windows. One of my favorite recently published ones is from the great people at Brent Ozar Unlimited (Brent himself in this case) here.

There is even many a blog post out there about how you can export your settings from one and import it to another here.

All of the blog posts I’ve read however have one problem for me. They seem very “clicky”, as in a lot of clicking of the mouse (or keyboard shortcuts) has to happen every time you want to import your settings and I thought there has to be a more programatic way.

After all, many of us espouse to the DRY (Don’t Repeat Yourself) technique when coding. Why wouldn’t we want to trim time and energy off of something as basic as a settings import.

And Josh has that solution.

The Costs And Benefits Of Triggers

Rob Farley hits on my biggest complaint about triggers:

But the problem isn’t what they do, it’s that you didn’t know it was there. Just like asbestos.

People complain about nested triggers, about triggers that are overly complex, about triggers which do too many things… but stored procedures have all these faults too. It’s just that you knew that the stored procedure was there, and you didn’t know that the trigger was there.

I wish that triggers were more visible inside the tools (Management Studio / Operations Studio / Visual Studio), right alongside stored procedures and procedural functions (those ones that use BEGIN & END and are bad, rather than inline functions). They’re code and should be treated as such.

Hiding triggers under tables makes it easy to forget about them, at least until you get some unexpected results.

When Scalar Functions Go Bad

Daniel Janik head-fakes us a few times when looking at scalar user-defined function performance:

I’ve read a lot of things lately pointing to scalar functions as if they were the devil. In this blog I’m going to explore if that’s the case. Let’s have a look.

It’s true that in many situations a scalar function is often a performance bottleneck; but, is there a situation where they could be responsibly used?

What if you had a lookup table that almost never changed? Is it worth doing a join on the lookup to get the data you need?

Let’s examine a simple join between a customer address and a state lookup table.

Things are not always as they seem.

The Benefits Of DML Triggers

Kendra Little tells us a tale of trigger value:

When I first edged my way into a Junior DBA-ish role, I worked with a complex application with many large databases. Customers loaded and configured data into a (mostly) OLTP style database, and then database was replicated to multiple other systems — some to publish data to and adserving platform, and some to transform the data for reporting.

Triggers were used extensively in these downstream systems to detect changes. It went like this:

  • Transactional replication publications were set up on the OLTP  (ish) database.

  • Transactional replication subscriptions were set up on each downstream server. A dedicated database was used for replication articles on each instance.

  • After replication was initialized, DML triggers were created on each article in the subscriber database. For each modification, the trigger would insert a narrow row into a “delta” table related to that article.

  • The “delta” tables were in their own schema, and contained row identifiers, timestamp columns, and custom indexes for processing. This enabled batches to be efficiently pulled from these tables for processing into the related system.

  • Cleanup processes periodically pulled processed rows out of the delta tables in the background (and indexes were designed to prevent the cleanup process from fighting with data processing jobs and inserts)

Read the whole thing.  There are some things that triggers can do easily which would be difficult to handle otherwise, but they can also be dangerous in the wrong hands.

Power BI Report Layouts

Jeanne Combrinck shows us an easy method of creating Power BI layouts:

PowerBI has this great functionality where you can go and download preset layouts which make your PowerBI reports stand out more. There is a nice trick to doing this yourself.

You can create layouts in PowerPoint and then save them as images. Then insert them into your PowerBI report as an image and send the image right to the back.

Click through for an example of this in action.

Categories

September 2018
MTWTFSS
« Aug  
 12
3456789
10111213141516
17181920212223
24252627282930