Press "Enter" to skip to content

Day: March 14, 2018

Levels And Unique In R

Eric Cai demonstrates the difference between levels() and unique() when dealing with factors in R:

The new data set “iris2” does not have any rows containing “setosa” as a possible value of “Species”, yet the levels() function still shows “setosa” in its output.

According to the user G5W in Stack Overflow, this is a desirable behaviour for the levels() function.  Here is my interpretation of the intent behind the creators of base R: The possible values of a factor are fundamental attributes of that variable, which should not be altered because of changes in the data.

There’s some back-and-forth in the comments; my takeaway is that both are useful functions depending upon what, exactly, you want to learn.

Comments closed

Row Goals And Anti-Joins

Paul White continues his row goals series:

The optimizer assumes that people write a semi join (indirectly e.g. using EXISTS) with the expectation that the row being searched for will be found. An apply semi join row goal is set by the optimizer to help find that expected matching row quickly.

For anti join (expressed e.g. using NOT EXISTS) the optimizer’s assumption is that a matching row will not be found. An apply anti join row goal is not set by the optimizer, because it expects to have to check all rows to confirm there is no match.

If there does turn out to be a matching row, the apply anti join might take longer to locate this row than it would if a row goal had been used. Nevertheless, the anti join will still terminate its search as soon as the (unexpected) match is encountered.

Another very interesting part of the series and well worth the time to read.

Comments closed

How LSNs Get Generated

Stuart Moore looks at how SQL Server builds log sequence numbers:

If you’ve ever dug down in the SQL Server transaction logs or had to build up restore chains, then you’ll have come across Log Sequence Numbers (LSNs). Ever wondered why they’re so large, why they all look suspiciously the same, why don’t they start from 0 and just how does SQL Server generate these LSNs? Well, here we’re going to take a look at them

Below we’ll go through examples of how to look inside the current transaction log, and backed up transaction logs. This will involve using some DBCC commands and the undocumented fn_dblog and fn_dump_dblog function. The last 2 are very handy for digging into SQL Server internals, but be wary about running them on a production system without understanding what’s going on. They can leave filehandles and processes behind that can impact on your system.

It’s an interesting look into SQL Server’s internals.

Comments closed

Using Biml With Azure Data Factory v2

Ben Weissman shows how you can use BimlStudio to build ADF v2 flows:

As you may have seen at PASS Summit 2017 or another event, with the announcement of Azure Data Factory v2 (adf), Biml will natively support adf objects.

Please note, that the native support is currently only available in BimlStudio 2018. If you’re using BimlExpress, you can still generate the JSON for your pipelines, datasets etc. using Biml but you cannot use the newly introduced tags.

The really good parts are only available in the paid product; if you do a lot of Azure Data Factory work, that might tip the scales in favor of getting BimlStudio.

Comments closed

Excluding Checks With dbachecks

Garry Bargsley shows us how to set a config which lets us exclude particular checks when running dbachecks:

While tweaking my Invoke-DbcCheck  the list of  -ExcludeCheck checks keeps growing and growing.

Sure does make for a long command line to scroll thru.

Click through to see how to save these excluded checks in a configuration file.

Comments closed

Meidinger’s Law

Eugene Meidinger shares his thoughts on the future:

Since we are prognosticating, I want to take a guess at one of the constraints limiting the future.  I present you with Meidinger’s law:

An industry’s growth is constrained by how much your junior dev can learn in two years.

Let me explain. On my team, one of our developers’ just left for a different company. We also have a college student who will be going full time in May, upon graduation. How long do you think it’s going to take the new guy to get up to speed?

And how long do you think he’s going to stay?

This I think is a useful dictum which explains a pretty good amount of industry movement.

Comments closed

Power BI Licensing Costs

Jason Thomas has put together a great Power BI report:

I thought it might be useful for some enterprise customers to see what the total cost is going to be for 3 years, and decided to share it here. You can use this guide to see some of the additional information like:-

  1. Forecast the growth in % for Pro, Frequent and Occasional users
  2. Get the total cost for 3 years based on the growth
  3. See the per user cost for each year
  4. Also, see the estimated utilization of the last Premium node, which will give you a good idea on whether you are close to upgrading or not

This is rather useful for long-term planning.

Comments closed