Press "Enter" to skip to content

Month: December 2016

SARGability And Date Math

Erik Darling explains performance ramifications of date math operations:

Both plans are helped by our thoughtfully named index on the OrderDate column, though the one with cheaper estimated cost is the bottom one. Yes, I know this can sometimes lie, but we’re not hiding any functions in here that would throw things off horribly. If you’re concerned about the Nested Loops join, don’t worry too much. There is a tipping point where the Constant Scan operator is removed in favor of just index access. I didn’t inflate this table to find exact row counts for that, but I’ve seen it at work elsewhere. And, yeah, the second query will still be cheaper even if it also scans.

Which side of the equation you put a function on can make all the difference.

Comments closed

Identity Not Found

Melissa Coates explains Azure Active Directory tenancy to solve an Azure Analysis Services error:

The Analysis Services product team explained to me that a a user from a tenant which has never provisioned Azure Analysis Services cannot be added to another tenant’s provisioned server. Put another way, our Corporate tenant had never provisioned AAS so the Development tenant could not do so via cross-tenant guest security.

One resolution for this is to provision an AAS server in a subscription associated with the Corporate tenant, and then immediately delete the service from the Corporate tenant. Doing that initial provisioning will do the magic behind the scenes and allow the tenant to be known to Azure Analysis Services. Then we can proceed to provision it for real in the Development tenant.

Read the whole thing.

Comments closed

Lists And Ranges In Powershell And T-SQL

Phil Factor has an interesting post on lists and ranges in two languages:

When a  list like ‘1,3,5,6,9’, or ’12 Jan 2016,14 Jan 2016, 18 Feb 2016’  contains a datatype that can be unambiguously sorted in the order of the values of the datatype, it becomes possible to imply a range. This will trim unwieldy lists significantly if they have a lot of contiguous values.  ‘1,2,3,4,5’ can be expressed as 1-5 (or 1..5). The starting integer is separated from the end integer in the range by a dash sign. This representation rather clashes with the minus sign, represented by the same symbol, but the comma removes the ambiguity. A number followed immediately by a ‘-‘ means that the ‘-‘is a range symbol.  As with SQL’s BETWEEN clause that selects rows, the range includes all the integers in the interval including both endpoints. Because the range syntax is intended to be a more compact form,  it is generally only used where there are at least three contiguous values.

Interesting article.  I recommend checking it out.

Comments closed

Powershell Runspaces

Chrissy LeMaire has a pair of posts on runspaces.  First, a post which includes a runspace template:

Runspaces can be intimidating. I had heard about them, took a look at the code and was like “Ah, that looks complex. I’ll learn that later.” Because of this, I repeatedly went over the bulk insert code to familiarize the audience with the functionality that I was eventually going to multi-thread.

Then she shows us how to output results without waiting for all runspaces to complete:

Recently, I had a request to add multi-threading to Read-DbaBackupHeader. This was the first runspace in which I had to output the results from the runspace to the host — usually I just needed commands to run in the background, like with bulk-inserts to SQL Server.

Runspaces seems to be a good option in Powershell for multithreading.

Comments closed

Powershell Remoting To Linux

Max Trinidad shows that Powershell remoting works for connecting Linux to Windows and vice versa:

Connecting Linux to Windows

This is strange but, more likely, you will get an error the first time you try to connect. Just try the second try and it will work.

I need to see what Powershell objects for Linux currently exist; my guess is “not many, if any” but as those start getting fleshed out, I think even the most adamant of grep-sed-awk users will want to pick up at least a little bit of Powershell.

Comments closed

Elastic Database Pools

Arun Sirpal describes Azure elastic database pools:

The key to using elastic database pools is that you must understand the characteristics of the databases involved and their utilisation patterns, if you do not understand this then the idea of using an elastic database pool may cause problems.

The maximum amount my pool has is 100 eDTUs, I know for a fact that the S2 databases will not be used at the same time, the other S0 databases might be used at the same time at the most 3 of them at the same time. Basically what I am saying here is that I know that when the databases concurrently peak I know that it will not go beyond the 100 eDTU limit.

One thing that Arun does not mention is the relative ease of interconnecting databases within a pool, so even if it doesn’t end up being cheaper on net, that might be a benefit worth having.

Comments closed

Custom R Visuals In Power BI

Ginger Grant notes that there are R-powered custom visuals for Power BI:

Interacting with R visuals works differently than with other report visualizations as you cannot click on elements within the visualization and filter other items on the page. Other visuals on the page will filter the data contained within the R visual. For example, let’s say my report contains a total field, a slicer which contains years and a correlation plot which contains products. If the slicker is changed to select a year, total field and the data within the R visual will change to reflect that. If on the other hand, I choose to click on the R visual to select one of the product categories, the total field will not change and the R visual will not change. The R visual’s appearance will not change in any way.

Read on for more.

Comments closed

Data Wrangling: R Versus M

Ryan Wade argues that R is a better language choice for working with data in Power BI than M:

Now let’s do something that I think is pretty slick. Let’s create a data set that combines the home games of the Pacers (IND) and the home games of the Hawks (ATL). Given the naming convention used by the files we will have to identify the files in our working directory that starts with an eight numeric digits > then a period > then a 3 character team abbreviation for the away team > then either “ATL” or “IND” > then finally “.csv”. We can create a regular expression to find the files that matches that pattern. I did so in the code below:

I’m interested in catching the rest of the series.  This is a controversial statement that I’m not entirely sold on yet, but Ryan does set the stage for his full argument.

Comments closed

Tabulizer

Troy Walters uses the Tabulizer package to extract tables from a PDF and turn them into an R matrices or data frames:

Next we will use the extract_tables() function from tabulizer. First, I specify the url of the pdf file from which I want to extract a table. This pdf link includes the most recent data, covering the period from July 1, 2016 to November 25, 2016. I am using the default parameters for extract_tables. These are guess and method. I’ll leave guess set to TRUE, which tells tabulizer that we want it to figure out the locations of the tables on its own. We could set this to FALSE if we want to have more granular control, but for this application we don’t need to. We leave the method argument set to “matrix”, which will return a list of matrices (one for each pdf page). This could also be set to return data frames instead.

This is nice.  I have to imagine it only works for text-based PDFs and not ones which are generated from a series of images.

Comments closed

Solving The German Tank Problem

Frank Portman shows how to figure out how many taxicabs—or tanks—there are:

For the uninitiated, the Taxicab / Germany Tank problem is as follows:

Viewing a city from the train, you see a taxi numbered x. Assuming taxicabs are consecutively numbered, how many taxicabs are in the city?

This was also applied to counting German tanks in World War II to know when/if to attack. Statstical methods ended up being accurate within a few tanks (on a scale of 200-300) while “intelligence” (unintelligence) operations overestimated numbers about 6-7x. Read the full details on Wikipedia here (and donate while you’re over there).

Click through for the solution and how to implement it in R.

Comments closed