Press "Enter" to skip to content

Author: Kevin Feasel

Row Goals And Anti-Joins

Paul White continues his row goals series:

The optimizer assumes that people write a semi join (indirectly e.g. using EXISTS) with the expectation that the row being searched for will be found. An apply semi join row goal is set by the optimizer to help find that expected matching row quickly.

For anti join (expressed e.g. using NOT EXISTS) the optimizer’s assumption is that a matching row will not be found. An apply anti join row goal is not set by the optimizer, because it expects to have to check all rows to confirm there is no match.

If there does turn out to be a matching row, the apply anti join might take longer to locate this row than it would if a row goal had been used. Nevertheless, the anti join will still terminate its search as soon as the (unexpected) match is encountered.

Another very interesting part of the series and well worth the time to read.

Comments closed

How LSNs Get Generated

Stuart Moore looks at how SQL Server builds log sequence numbers:

If you’ve ever dug down in the SQL Server transaction logs or had to build up restore chains, then you’ll have come across Log Sequence Numbers (LSNs). Ever wondered why they’re so large, why they all look suspiciously the same, why don’t they start from 0 and just how does SQL Server generate these LSNs? Well, here we’re going to take a look at them

Below we’ll go through examples of how to look inside the current transaction log, and backed up transaction logs. This will involve using some DBCC commands and the undocumented fn_dblog and fn_dump_dblog function. The last 2 are very handy for digging into SQL Server internals, but be wary about running them on a production system without understanding what’s going on. They can leave filehandles and processes behind that can impact on your system.

It’s an interesting look into SQL Server’s internals.

Comments closed

Using Biml With Azure Data Factory v2

Ben Weissman shows how you can use BimlStudio to build ADF v2 flows:

As you may have seen at PASS Summit 2017 or another event, with the announcement of Azure Data Factory v2 (adf), Biml will natively support adf objects.

Please note, that the native support is currently only available in BimlStudio 2018. If you’re using BimlExpress, you can still generate the JSON for your pipelines, datasets etc. using Biml but you cannot use the newly introduced tags.

The really good parts are only available in the paid product; if you do a lot of Azure Data Factory work, that might tip the scales in favor of getting BimlStudio.

Comments closed

Excluding Checks With dbachecks

Garry Bargsley shows us how to set a config which lets us exclude particular checks when running dbachecks:

While tweaking my Invoke-DbcCheck  the list of  -ExcludeCheck checks keeps growing and growing.

Sure does make for a long command line to scroll thru.

Click through to see how to save these excluded checks in a configuration file.

Comments closed

Meidinger’s Law

Eugene Meidinger shares his thoughts on the future:

Since we are prognosticating, I want to take a guess at one of the constraints limiting the future.  I present you with Meidinger’s law:

An industry’s growth is constrained by how much your junior dev can learn in two years.

Let me explain. On my team, one of our developers’ just left for a different company. We also have a college student who will be going full time in May, upon graduation. How long do you think it’s going to take the new guy to get up to speed?

And how long do you think he’s going to stay?

This I think is a useful dictum which explains a pretty good amount of industry movement.

Comments closed

Power BI Licensing Costs

Jason Thomas has put together a great Power BI report:

I thought it might be useful for some enterprise customers to see what the total cost is going to be for 3 years, and decided to share it here. You can use this guide to see some of the additional information like:-

  1. Forecast the growth in % for Pro, Frequent and Occasional users
  2. Get the total cost for 3 years based on the growth
  3. See the per user cost for each year
  4. Also, see the estimated utilization of the last Premium node, which will give you a good idea on whether you are close to upgrading or not

This is rather useful for long-term planning.

Comments closed

XGBoost With Python

Fisseha Berhane looked at Extreme Gradient Boosting with R and now covers it in Python:

In both R and Python, the default base learners are trees (gbtree) but we can also specify gblinear for linear models and dart for both classification and regression problems.
In this post, I will optimize only three of the parameters shown above and you can try optimizing the other parameters. You can see the list of parameters and their details from the website.

It’s hard to overstate just how valuable XGBoost is as an algorithm.

Comments closed

Domain, Range, And Codomain

Kevin Sookocheff explains the concepts of domain, range, and codomain:

That is, a function relates an input to an output. But, not all input values have to work, and not all output values. For example, you can imagine a function that only works for positive numbers, or a function that only returns natural numbers. To more clearly specify the types and values of a functions input and output, we use the terms domain, range, and codomain.

Speaking as simply as possible, we can define what can go into a function, and what can come out:

  • domain: what can go into a function

  • codomain: what may possibly come out of a function

  • range: what actually comes out of a function

Read on for more, including a couple of examples.  These are important concepts for learning functional programming.

Comments closed

Experimenting With The Data Professional Salary Survey

Mala Mahadevan investigates a potential correlation in the data professional salary survey:

The questions I was looking at are as below:
1 Is there any correlation between experience and number of hours worked?
2 Is there any correlation between experience and job duties/kinds of tasks performed?
3 Is there any correlation between experience and managing staff – ie – do more people with experience take to management as a form of progress?

I am using this blog post to explore question 1.

Click through to see if there is a correlation between experience and hours worked.  One critique I have is that years of experience is not normally distributed:  there’s a hard cutoff at 0, so although the possible range does follow what a hypothetical normal distribution would do (and it doesn’t really affect the analysis Mala did), that difference can be important in other analyses.

Comments closed

Linking Azure VMs To An On-Prem Domain

Denny Cherry explains how to integrate Azure VMs with your existing Active Directory domain:

The first step is to put some domain controllers in Azure.  To do this, you’ll need a site to site VPN between Azure and your on-premises environment.  If you have multiple on-premises sites, then you’ll want to create a VPN between Azure and all your on-premises environments.  If your Azure environment is hosted in multiple regions, then you’ll want to create a mesh network when each on-premises site in VPNed into all of your vNets.  You’ll probably also want your vNets VPNed to each other (Peering of your networks between sites may be an option as well depending on how you’ve set things up).  If you have an extremely large number of users at your site, then Express Route might be something worth looking into instead of a site to site VPN.

Click through for the full process.

Comments closed