Understanding Decision Trees

Ramandeep Kaur explains how decision trees work:

Simply put, a decision tree is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.

It is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems and works for both categorical and continuous input and output variables. It is one of the most widely used and practical methods for Inductive Inference. (Inductive inference is the process of reaching a general conclusion from specific examples.)

Decision trees learn and train itself from given examples and predict for unseen examples.

Click through for an example of implementing the ID3 algorithm and generating a decision tree from a data set.

Bad Parameter Sniffing Flowchart

Grant Fritchey is asking for input on a new flowchart he has created:

Lots of people are confused by how to deal with bad parameter sniffing when it occurs. In an effort to help with this, I’m going to try to make a decision flow chart to walk you through the process. This is a rough, quite rough, first draft.

I would love to hear any input. For this draft, I won’t address the things I think I’ve left out. I want to see what you think of the decision flow and what you think might need to be included. Click on it to embiggen.

I think it’s a great first step.  I think a decision to add local variables and use them instead of parameters would be useful, particularly in contrast to using RECOMPILE and OPTIMIZE FOR UNKNOWN.

Checkpointing Code For Reproduction

Kevin Feasel

2017-08-16

R

David Smith tells an interesting story about a reproducibility problem with data analysis:

Timo Grossenbacher, data journalist with Swiss Radio and TV in Zurich, had a bit of a surprise when he attempted to recreate the results of one of the R Markdown scripts published by SRF Data to accompany their data journalism story about vested interests of Swiss members of parliament. Upon re-running the analysis in R last week, Timo was surprised when the results differed from those published in August 2015. There was no change to the R scripts or data in the intervening two-year period, so what caused the results to be different?

The version of R Timo was using had been updated, but that wasn’t the root cause of the problem. What had also changed was the version of the dplyr package used by the script: version 0.5.0 now, versus version 0.4.2 then. For some unknown reason, a change in the dplyr package in the intervening package caused some data rows (shown in red above) to be deleted during the data preparation process, and so the results changed.

Click through for the solution, which is pretty easy in R.

Memory-Optimized Object Isolation Levels

Ned Otter looks at the isolation levels offered when you work with memory-optimized objects:

If you are only querying on-disk tables, you can use any of the isolations levels from List 1. And if you are only querying memory-optimized tables, you can use any of the isolation levels from List 2.

But what if you want to reference both on-disk and memory-optimized tables in the same query? Of course, the answer is “it depends”, with transaction initiation modes and isolation levels being the components of that dependency.

This post is part one of a series and is mostly around level-setting.

What Is DAX?

Matt Allington covers some of the basics of DAX:

Do I need to learn the DAX language?

You certainly do not need to know how to write DAX to get started with Power BI.  Power BI is the newest business intelligence tool that leverages the DAX language (via Power Pivot) and it is definitely possible to get started and build some reports without learning any DAX at all.   If you are a “consumer of reports” that other people produce for you then you certainly don’t need to learn any DAX.  However if you are someone that wants to do your own adhoc (or structured) analysis of data using Power BI, Power Pivot for Excel, then you will definitely want to learn to write some DAX in order to get value from what these new tools have to offer.

It’s a good intro if you aren’t familiar with DAX.

Thoughts On CLR Strict Security

Solomon Rutzky has started a series on CLR in SQL Server 2017 and lays down a gauntlet:

What all of that means is that, assuming clr strict security is “1” (i.e. enabled), and TRUSTWORTHY is “OFF” for the Database in which an Assembly is being created, then in order to create any Assembly you first need to:

  1. Sign the Assembly with a strong-name key or a certificate
  2. Create an Asymmetric Key or Certificate in master from whatever you signed the Assembly with
  3. Create a Login based on that Asymmetric Key or Certificate
  4. Grant that Login the UNSAFE ASSEMBLY permission

Is that really so bad? Aren’t many of us (hopefully!) already doing that?

Solomon’s not very happy with the way that CLR security works in 2017, but he does have solutions of his own in mind.

CROSS APPLY Replacing REPLACE

Kevin Feasel

2017-08-16

T-SQL

Bert Wagner shows off a good use of the APPLY operator:

Here we only have 4 nested REPLACE functions. My shameful record is 29. I’m not proud of it, but sometimes it’s the only way to get things done.

Not only are these nested REPLACE() functions difficult to write, but they are difficult to read too.

Instead of suffering through all of that ugly nesting, what you can do instead is use CROSS APPLY:

Click through for the example.  This is one of several great uses for the APPLY operator.

Categories

August 2017
MTWTFSS
« Jul Sep »
 123456
78910111213
14151617181920
21222324252627
28293031