2018-04-19 – Curated SQL

Using Have I Been Pwned In R

Published 2018-04-19 by Kevin Feasel

Maelle Salmon shows us how to use the HIBPwned library in R:

The alternative title of this blog post is HIBPwned version 0.1.7 has been released! W00t!. Steph’s HIBPwned package utilises the HaveIBeenPwned.com API to check whether email addresses and/or user names have been present in any publicly disclosed data breach. In other words, this package potentially delivers bad news, but useful bad news!

This release is mainly a maintenance release, with some cool code changes invisible to you, the user, but not only that: you can now get account_breaches for several accounts in a data.frame instead of a list, and you’ll be glad to know that results are cached inside an active R session. You can read about more functionalities of the package in the function reference.

Wouldn’t it be a pity, though, to echo the release notes without a nifty use case? Another blog post will give more details about the technical aspects of the release, but here, let’s make you curious! How many CRAN package maintainers have been pwned?

Read on to find out that answer.

Comments closed

Jupyter Notebooks In Azure

Published 2018-04-19 by Kevin Feasel

Steve Jones looks at using Jupyter Notebooks in Azure:

There’s a new feature in Azure, and I stumbled on it when someone posted a link on Twitter. Apologies, I can’t remember who, but I did click on the Azure Notebooks link and was intrigued. I’ve gotten Jupyter notebooks running on my local laptop, but these are often just on one machine. Having a place to share a notebook in the cloud is cool.

Once I clicked on the link, I found these are both R and Python notebooks, as well as F#. These allow you to essentially build a page of code and share it. It’s kind of like a REPL, kind of like a story. It’s a neat way of working through a problem. I clicked the Get Started link to get going and was prompted for a User ID.

I’m a major fan of using notebooks for validating results as well as training people.

Comments closed

Index Design When Handling Sorts

Published 2018-04-19 by Kevin Feasel

Erik Darling walks us through some of the nuance of index deisgn:

When tuning queries that need to sort large amounts of data, sometimes it makes sense to stick the ordering elements as the leading key column(s) in your index. This allows SQL Server to easily sort your data by that column, and then access other key and included columns to satisfy other parts of the query, whether they’re joins, predicates, or selected columns.

That doesn’t mean that indexes with the sorting element first is always the best idea. Depending on how your query looks, you can sometimes put sorting elements at the end of an index definition and still have your sort be supported.

Read on for an example.

Comments closed

Running The SQL Server Features Discovery Report

Published 2018-04-19 by Kevin Feasel

Dave Mason shows us how to run the SQL Server features discovery report via command prompt and PowerShell:

I don’t need to validate SQL Server installations on a regular basis. When the need arises, my preference is to run the SQL Server features discovery report. Further, I prefer to run it from the command line. After looking up command line parameters one too many times, I decided to script it out.

It turns out the script commands are a little more complicated than I realized: there is a different setup.exe file for each version of SQL Server installed. I ended up making two script versions: a DOS batch file with hard-coded paths, and a PowerShell script that’s more robust. Check them out and let me know what you think. (Keep scrolling down for a report sample image.)

I’m not sure I’ve ever run that report, but now I know how to do it from Powershell.

Comments closed

Substitution Variables In Power Query

Published 2018-04-19 by Kevin Feasel

Doug Burke gives us an example of here substitution variables make our lives easier in Power Query:

A substitution variable substitutes a variable (a changing value) to get a different result

    a + b = c (where ‘a’ and ‘b’ are substitution variables that define value ‘c’)

        If a = 5 and b = 2 then c = 7

        If a = 25 and b = 9 then c = 34

SubVars are especially helpful when

you need to move Power Query files from one file path, folder and file name to something completely different

you want to use the same PQ reports for different time periods (ie changing months)

It involves looking for changeable items. If something may or will change in the future, it’s a candidate for a substitution variable.

file paths are good for subVars
so are time elements such as Year or Month
you can also combine subVars where both the file path and month may change

Click through for several good examples.

Comments closed

What Read Committed Isolation Level Gets You

Published 2018-04-19 by Kevin Feasel

Paul Randal explains the answer, which is “not much”:

The ‘weird’ behavior is that when the “Batch 2” select completes, after having been blocked by the “Batch 1” transaction, it doesn’t return all 1,000 rows (even though “Batch 1” has completed). Furthermore, depending on when the “Batch 2” select is started, during the 10-seconds that “Batch 1” executes, “Batch 2” returns different numbers of rows. This behavior had also been reported on earlier versions of SQL Server as well. It’s easy to reproduce on SQL Server 2016/2017 and can be reproduced in all earlier versions with a single configuration change (more details in a moment).

Additionally, if the table has a clustered index created, 1,000 rows are returned every time, on all versions of SQL Server.

So why is this weird? Many people expect that all 1,000 rows will be returned every time AND that the structure of the table or the version of SQL Server should not make any difference.

Unfortunately, that assumption is not correct when using read committed.

Read Committed is a trade-off, not an ideal.

Comments closed

The Joys Of Boolean Logic

Published 2018-04-19 by Kevin Feasel

Jen McCown explains boolean logic via truth tables:

t’s really fine if all the circuitry and algebra stuff makes no sense to you. We’re going to use a tried-and-true method to figuring out how these things will come out.

For a truth table, you just put in every combination of input types – meaning, inputs that will evaluate to true, and those that evaluate to false – and work out how the clause will evaluate it overall.

What do I mean? Well, we have two inputs for the questions above: StatusID and UserForeignID, which I’ll shorten to ID and ForeignIDto save characters. Logically speaking:

ID can either be equal to 1, or to a value other than 1.
ForeignID can either be equal to TD75R, or to a value other than TD75R.

A logic course is particularly helpful in these cases, but start by reading the whole thing.

Comments closed

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Day: April 19, 2018

Using Have I Been Pwned In R

Jupyter Notebooks In Azure

Index Design When Handling Sorts

Running The SQL Server Features Discovery Report

Substitution Variables In Power Query

What Read Committed Isolation Level Gets You

The Joys Of Boolean Logic