Curated SQL – Page 1028 – A Fine Slice Of SQL Server

The user wants to unpivot the data by rotating the three header rows (Scenario Type, Month, and Year) from columns to rows. The issue is that the headers span three rows. If you just select these columns and unpivot, you’ll end up with a mess. And Power Query operates on row at the time so you can’t reference previous rows, such as to concatenate Scenario, Month, and Year. We can do the concatenation in Excel so we have one row with column headers, such as Actuals-Jan-2018, Actuals-Feb-2018, and so on, which we can easily unpivot in Power Query. But if we can’t or don’t want to modify the Excel file, such as to avoid the same steps every time a new file comes in?

Click through for a sample file which shows how you can do this.

Comments closed

Building an AKS Cluster

Published 2019-06-24 by Kevin Feasel

Mohammad Darab continues a series on Big Data Clusters by creating a Kubernetes pod in Azure Kubernetes Service:

Next, we will create a resource group by executing the following command:
az group create –name nameOfMyresourceGroup –location eastus2
Once you execute the above command, you can go into the Azure portal and refresh your resource group pane and see the newly created resource group.
Once that is setup, it’s time to create the actual Kubernetes cluster.

Click through for the full set of instructions.

Comments closed

Building a Power BI Accordion Filter

Published 2019-06-24 by Kevin Feasel

David Eldersveld builds out a Power BI accordion filter:

The Power BI custom accordion relies on Bookmarks and Buttons as key elements. I’ve only created two categories in my accordion. I’ll be honest–it’s probably more work than it’s worth to keep track of different buttons due to positions as well as what’s visible or hidden for each bookmark. The thought of expanding to three categories is a bit daunting. Why is that?

Read on to see why (hint: combinatorial explosion).

Comments closed

Importing Biml Metadata from Excel

Published 2019-06-24 by Kevin Feasel

David Stein shows how you can take table and column data from Excel and use it to populate Biml flows:

Excel Spreadsheets as a metadata source have a lot going for them.
– Everyone uses Excel and is comfortable with it.
– Excel is incredibly customizable and versatile.
– Excel offers data validation and filtering.
For these reasons, I create customized Excel spreadsheet that function as a lite Graphic User Interface (GUI) for metadata. Of course, Excel isn’t a perfect metadata source. For one thing, you have to own a licensed copy of Excel. Second, because spreadsheets are so easy to customize, users sometimes “improve” them further and break your code.

Read on for an example.

Comments closed

Using Cohen’s D for Experiments

Published 2019-06-21 by Kevin Feasel

Nina Zumel takes us through Cohen’s D, a useful tool for determining effect sizes in experiments:

Cohen’s d is a measure of effect size for the difference of two means that takes the variance of the population into account. It’s defined as
d = | μ₁ – μ₂ | / σ_pooled
where σ_pooled is the pooled standard deviation over both cohorts.

Read the whole thing.

Comments closed

Comparing Iterator Performance in R

Published 2019-06-21 by Kevin Feasel

Ulrik Stervbo has a performance comparison for for, apply, and map functions in R:

It is usually said, that for– and while-loops should be avoided in R. I was curious about just how the different alternatives compare in terms of speed.
The first loop is perhaps the worst I can think of – the return vector is initialized without type and length so that the memory is constantly being allocated.

The performance of map isn’t great, though the benefits to me are less about performance and more about readability. H/T R-bloggers

Comments closed

Calculating Value Above Replacement

Published 2019-06-21 by Kevin Feasel

Rob Collie explains how to calculate Value Above Replacement:

Let’s say you’re trying to improve the fuel efficiency of your car, and you have a budget of $100 to make those improvements. Furthermore, let’s say that you can only spend your budget on three different kinds of improvements: better tires, better spark plugs, and a better exhaust system.
The simple answer, of course, is to buy the Best of each part – the best Tires, the best Sparkplugs, and the best Exhaust System.
But what if Reality rears its head and rudely insists that the Best Costs More? What if there are a range of options available in each bucket, with varying price AND effectiveness? Then what?

Read on for the solution which has not nearly enough baseball.

Comments closed

An Intro to SQL Server Big Data Clusters

Published 2019-06-21 by Kevin Feasel

Mohammad Darab has a series on Big Data Clusters. Part zero explains what they are:

The *absolute* unique feature that BDCs offer that *no* other company or product offers is: Data Virtualization
The new and enhanced SQL Server 2019 Polybase feature comes with connectors to many different data sources: Oracle, Teradata, Apache Spark, MongoDB, Azure Cosmos DB, and even ODBC connectivity to IBM’s DB2, SAP HANA and excel (see image below)

Part one shows how to set one up:

So far, Microsoft does not have a simple way to create a Big Data Cluster. It’s a bit cumbersome of a process and the learning curve is a bit steep. However, Microsoft is currently working on making it easier to deploy a Big Data Cluster via Notebook in Azure Data Studio and eventually some type of “deployment wizard.” But for now, the only option is to do it the long way.

The series will continue, but check out the setup work.

Comments closed

Replaying Workloads with WorkloadTools

Published 2019-06-21 by Kevin Feasel

Gianluca Sartori shows an example of using the WorkloadTools application to replay a workload, including where the analytics server cannot directly access the production database:

Regardless of the method that you decided to use, at the end of the replays, you will have two distinct sets of tables containing the workload analysis data, sitting in different schemas in the same database or in completely different databases.
WorkloadViewer will let you visualize performance over time, as we have seen for a single workload analysis, but this time it will be able to show you data from both workloads, so that you can compare them.

This sort of production load testing is both important and difficult; WorkloadTools makes it easier.

Comments closed

Troubleshooting Query Performance Changes

Published 2019-06-21 by Kevin Feasel

Erin Stellato walks us through a troubleshooting guide when users complain about poorly-performing queries:

This is tale of troubleshooting…
When you unexpectedly or intermittently encounter a change in query performance, it can be extremely frustrating and troublesome for a DBA or developer. If you’re not using Query Store, a third-party application, or your own method to capture query data, then you probably don’t have information about what query performance looked like when things were good…you just know how it’s running now. I was working with a customer of Tim’s last week that had been chasing this exact problem for, in their words, years. They had recently upgraded to SQL Server 2016, and the problem was still occurring.

Strangely, “blame the network” didn’t appear in Erin’s post, so I don’t know if it’s comprehensive.

Comments closed

M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Curated SQL Posts